INTRODUCTION 
TO COMPUTER 

THEORY 


SECOND EDITION 


Daniel I. A. Cohen 

Hunter College 

City University of New York 



John Wiley & Sons, Inc. 

New York Chichester Brisbane Toronto Singapore Weinheim 




r 




ACQUISITIONS EDITOR Regina Brooks 
MARKETING MANAGER Jay Kirsch 
SENIOR PRODUCTION EDITOR Tony VenGraitis 
DESIGN SUPERVISOR Anne Marie Renzi 
MANUFACTURING MANAGER Mark Cirillo 
ILLUSTRATION COORDINATOR Rosa Bryant 
PRODUCTION MANAGEMENT J. Carey Publishing Service 

This book was set in 10/12 Times Roman by Digitype and 

printed and bound by Hamilton Printing. The cover was printed by Lehigh Press. 

Recognizing the importance of preserving what has been written, it is a 
policy of John Wiley & Sons, Inc. to have books of enduring value published 
in the United States printed on acid-free paper, and we exert our best 
efforts to that end. 

The paper in this book was manufactured by a mill whose forest management programs include 
sustained yield harvesting of its timberlands. Sustained yield harvesting principles ensure that 
the number of trees cut each year does not exceed the amount of new growth. 

Copyright © 1991, 1997, by John Wiley & Sons, Inc. 

All rights reserved. Published simultaneously in Canada. 

Reproduction or translation of any part of 
this work beyond that permitted by Sections 
107 and 108 of the 1976 United States Copyright 
Act without the permission of the copyright 
owner is unlawful. Requests for permission 
or further information should be addressed to 
the Permissions Department, John Wiley & Sons, Inc 

0-471-13772-3 

Printed in the United States of America 




Au Professeur M.-P. Schiitzenberger 
comme un temoignage de profonde 
et affectueuse reconnaissance 


During the preparation of this second edition Alonzo Church has 
passed away at the age of 92. As a mathematical logician he was a 
theoretician par excellence and preeminent in the development of 
Computer Theory . His students include Stephen C. Kleene who 
figures prominently in this book. When Alan Turing was working on 
the consequences and ramifications of his model of computation it 
was to Godel and Church in Princeton that he went to study. I too 
was a student of Church’s. He was a formative influence on my 
development—a blessed memory and a saintly man. 


10 9 8 7 6 5 4 








PREFACE 
TO THE FIRST EDITION 


It has become clear that some abstract Computer Theory should be included in the education 
of undergraduate Computer Science majors. 

Leaving aside the obvious worth of knowledge for its own sake, the terminology, nota¬ 
tions, and techniques of Computer Theory are necessary in the teaching of courses on com¬ 
puter design, Artificial Intelligence, the analysis of algorithms, and so forth. Of all the pro¬ 
gramming skills undergraduate students learn, two of the most important are the abilities to 
recognize and manipulate context-free grammars and to understand the power of the recur¬ 
sive interaction of parts of a procedure. Very little can be accomplished if each advanced 
course has to begin at the level of defining rules of production and derivations. Every inter¬ 
esting career a student of Computer Science might pursue will make significant use of some 
aspects of the subject matter of this book. 

Yet we find today, that the subjects of Automata Theory, Formal Languages, and Turing 
machines are almost exclusively relegated to the very advanced student. Only textbooks de¬ 
manding intense mathematical sophistication discuss these topics. Undergraduate Computer 
Science majors are unlikely to develop the familiarity with set theory, logic, and the facility 
with abstract manipulation early enough in their college careers to digest the material in the 
existing excellent but difficult texts. 

Bringing the level of sophistication to the exact point where it meets the expected prepa¬ 
ration of the intended student population is the responsibility of every carefully prepared 
textbook. Of all the branches of Mathematics, Computer Science is one of the newest and 
most independent. Rigorous mathematical proofs of the most profound theorems in this sub¬ 
ject can be constructed without the aid of Calculus, Number Theory, Algebra, or Topology. 
Some degree of understanding of the notion of proof is, of course, required, but the tech¬ 
niques employed are so idiosyncratic to this subject that it is preferable to introduce them to 
the student from first principles. Characteristic methods, such as making accurate conclu¬ 
sions from diagrams, analyzing graphs, or searching trees, are not tools with which a typical 
mathematics major is familiar. Hardly any students come prepared for the convoluted sur¬ 
prise of the Halting Problem. These then are the goals of this textbook: (1) to introduce a 
student of Computer Science to the need for and the working of mathematical proof; (2) to 
develop facility with the concepts, notations, and techniques of the theories of Automata, 
Formal Languages, and Turing machines; and (3) to provide historical perspective on the 
creation of the computer with a profound understanding of some of its capabilities and limi¬ 
tations. 

Basically, this book is written for students with no presumed background of any kind. 
Every mathematical concept used is introduced from scratch. Extensive examples and 
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illustrations spell out everything in detail to avoid any possibility of confusion. The bright 
student is encouraged to read at whatever pace or depth seems appropriate. 

For their excellent care with this project I thank the staff at John Wiley & Sons: Richard 
J. Bonacci, acquisitions editor, and Lorraine F. Mellon, Eugene Patti, Elaine Rauschal, and 
Ruth Greif of the editorial and production staffs. Of the technical people who reviewed the 
manuscript I thank Martin Kaliski, Adrian Tang, Martin Davis, and especially H. P. Edmund- 
son, whose comments were invaluable and Martin J. Smith whose splendid special support 
was dispositive. Rarely has an author had an assistant as enthusiastic, dedicated, knowledge¬ 
able and meticulous as I was so fortunate to find in Mara Chibnik. Every aspect of this pro¬ 
ject from the classnotes to the page proofs benefited immeasurably from her scrutiny. Very 
little that is within these covers—except for the few mistakes inserted by mischievous Mar¬ 
tians—does not bare the mark of her relentless precision and impeccable taste. Every large 
project is the result of the toil of the craftsmen and the sacrifice and forebearance of those 
they were forced to neglect. Rubies are beneath their worth. 


Daniel I. A. Cohen 


PREFACE 
TO THE SECOND EDITION 



In the first edition I intentionally omitted some topics because their discussion and/or proof 
involved mathematics that I felt was hopelessly beyond the scope of my intended audience. 
Students have not gotten more mathematically sophisticated but I have figured out how to 
demystify some of these themes in a much simpler way with no loss of rigor. Along the way 
various proofs that used to be cumbersome have been somewhat streamlined, and some 
embarrassing errors have been unearthed and demolished. 

Undergraduate Computer Science majors generally do not speak the language of math¬ 
ematical symbolism fluently, nor is it important at their level that they do more than try. The 
value of mathematical iconography is that it enables professionals to perform their research 
and communicate their results more efficiently. The symbolism is not a profound discovery 
in and of itself. It is at best a means, not an end. To those to whom it is opaque, it is a hin¬ 
drance to understanding. When this happens it is mathematically dysfunctional and a peda¬ 
gogical anathema. Anyone who believes that {j: 1 < j < «} is somehow more rigorous than 
{1,2,. . . a} is misguided. He has forgotten how the typography “1 <js n ” was defined 
to him in the first place. All mathematical symbolism can be reduced to human language be¬ 
cause it is through iterations of human language substitutes that it was defined initially. In¬ 
stead of introducing “mathematics” in an alienating form that only has to be expounded any¬ 
way, I prefer to skip the pretentious detour and provide the explanation itself directly. 
Computer science has needlessly carried an inferiority complex among the branches of 
mathematics, causing a defensive embedding into mainstream symbolism to lend it an aura 
of legitimacy. Yet it has been, as Hilbert himself predicted, one of the principal departments 
of mathematical discovery in the last century. 

Still no pretense is made to encyclopedic completeness. This textbook is an introduction 
to computer theory and contains the minimum collegiate requirements of theory for com¬ 
puter science majors. No, I have not added a chapter on NP-completeness, primitive and par¬ 
tial recursion, program verification, artificial intelligence, nor Renaissance architecture. 
These are all topics worthy of being included in some course but to squeeze them in here 
would necessarily displace some of the more pertinent and fundamental aspects of theory, 
and would thereby disadvantage the student. 

High on my list of cheap tricks is the inclusion of material in textbooks that is never 
meant to be covered in the intended course in the first place. I have heard members of text¬ 
book selection committees who say, “Let’s adopt X’s elementary calculus text because he 
has a chapter on general relativity while our current textbook contains only calculus.” Sales¬ 
manship should not be the business of textbook authors—educating students should. Mak- 
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ing students pay for 300 extra pages of material that is not intended to be covered in the 
course harms them in financial, muscular, and psychological ways. 

Ideally a textbook should begin at the level of understanding of the students taking the 
course. It should include all the material they have contracted to learn presented in a fashion 
maximally suited for them to absorb. When it has completed the syllabus it should stop. Al¬ 
lowances may be made for instructor discretion in choosing material that is basic to the 
course and in the selection of which topics warrant special emphasis. However, there are 
some fanatics who have the grandiose notion that to be a great teacher is to stuff more mater¬ 
ial into a course than their students can learn. I view this as sheer and simple breach of con¬ 
tract. Let these zealots adopt a graduate textbook and let their students protest accordingly. 
There is no comparison between the error of covering too little and covering too much. To 
attempt to cover too much is to rob the students of the chance to learn and to undermine their 
self-confidence. 

This book is unabashedly easy to read. It is intentionally slow-paced and repetitive. Let 
the bright student blitz through it, but let the slower student find comfort and elucidation. 
The nuances in this material are unlike anything (mathematical or otherwise) seen before in 
a course or textbook. A leisurely stroll through these charming gems can be enjoyable, stim¬ 
ulating, and rewarding. My duty to computer science students is to protect them against their 
own fear of mathematics, to demonstrate to them that a proof is no more or less than an un¬ 
derstanding of why the theorem is true, and to allow them to savor the intellectual richness 
of the theoretical foundations of what is ultimately the most important invention since antiq¬ 
uity. 

Is this book ideal? That would be unlikely, wouldn’t it? But it is designed with good sci¬ 
entific intentions and sincere concern for those interested in learning. 

It gives me pleasure to thank Chanah Brenenson who served as the technical editor and 
tireless critic to this edition. May she live long and prosper. 
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CHAPTER 1 


Background 


The twentieth century has been filled with the most incredible shocks and surprises: the the¬ 
ory of relativity, the rise and fall of communism, psychoanalysis, nuclear war, television, 
moon walks, genetic engineering, and so on. As astounding as any of these is the advent of 
the computer and its development from a mere calculating device into what seems like a 
“thinking machine.” 

The birth of the computer was not wholly independent of the other events of this cen¬ 
tury. Its inception was certainly impelled if not provoked by war and its development was fa¬ 
cilitated by the evolution of psycho-linguistics, and it has interacted symbiotically with all 
the aforementioned upheavals. The history of the computer is a fascinating story; however, it 
is not the subject of this course. We are concerned instead with the theory of computers, 
which means that we shall form several mathematical models that will describe with varying 
degrees of accuracy parts of computers, types of computers, and similar machines. The con¬ 
cept of a “mathematical model” is itself a very modem construct. It is, in the broadest sense, 
a game that describes some important real-world behavior. Unlike games that are simula¬ 
tions and used for practice or simply for fun, mathematical models abstract, simplify, and 
codify to the point that the subtle observations and conclusions that can be made about the 
game relate back in a meaningful way to the physical world, shedding light on that which 
was not obvious before. We may assert that chess is a mathematical model for war, but it is a 
very poor model because wars are not really won by the simple assassination of the leader of 
the opposing country. 

The adjective “mathematical” in this phrase does not necessarily mean that classical 
mathematical tools such as Euclidean geometry or calculus will be employed. Indeed, these 
areas are completely absent from the present volume. What is mathematical about the mod¬ 
els we shall be creating and analyzing is that the only conclusions that we shall be allowed 
to draw are claims that can be supported by pure deductive reasoning; in other words, we are 
obliged to prove the truth about whatever we discover. Most professions, even the sciences, 
are composed of an accumulation of wisdom in the form of general principles and rules that 
usually work well in practice, such as “on such and such a wood we recommend this under¬ 
coat,” or “these symptoms typically respond to a course of medication X.” This is com¬ 
pletely opposite from the type of thing we are going to be doing. While most of the world is 
(correctly) preoccupied by the question of how best to do something, we shall be completely 
absorbed with the question of whether certain tasks can be done at all. Our main conclusions 
will be of the form, “this can be done” or “this can never be done.” When we reach conclu¬ 
sions of the second type, we shall mean not just that techniques for performing these tasks 


are unknown at the present time, but that such techniques will never exist in the future no 
matter how many clever people spend millennia attempting to discover them. 

The nature of our discussion will be the frontiers of capability in an absolute and time¬ 
less sense. This is the excitement of mathematics. The fact that the mathematical models that 
we create serve a practical purpose through their application to computer science, both in the 
development of structures and techniques necessary and useful to computer programming 
and in the engineering of computer architecture, means that we are privileged to be playing a 
game that is both fun and important to civilization at the same time. 

The term computer is practically never encountered in this book—we do not even de¬ 
fine the term until the final pages. The way we shall be studying about computers is to build 
mathematical models, which we shall call machines, and then to study their limitations by 
analyzing the types of inputs on which they operate successfully. The collection of these 
successful inputs we shall call the language of the machine, by analogy to humans who can 
understand instructions given to them in one language but not another. Every time we intro¬ 
duce a new machine we will learn its language, and every time we develop a new language 
we shall try to find a machine that corresponds to it. This interplay between languages and 
machines will be our way of investigating problems and their potential solution by auto¬ 
matic procedures, often called algorithms, which we shall describe in a little more detail 
shortly. 

The history of the subject of computer theory is interesting. It was formed by fortunate 
coincidences, involving several seemingly unrelated branches of intellectual endeavor. A 
small series of contemporaneous discoveries, by very dissimilar people, separately moti¬ 
vated, flowed together to become our subject. Until we have established more of a founda¬ 
tion, we can only describe in general terms the different schools of thought that have melded 
into this field. 

The most fundamental component of computer theory is the theory of mathematical 
logic. As the twentieth century started, mathematics was facing a dilemma. Georg Cantor 
had recently invented the theory of sets (unions, intersections, inclusion, cardinality, etc.). 
But at the same time he had discovered some very uncomfortable paradoxes—he created 
things that looked like contradictions in what seemed to be rigorously proven mathematical 
theorems. Some of his unusual findings could be tolerated (such as the idea that infinity 
comes in different sizes), but some could not (such as the notion that some set is bigger than 
the universal set). This left a cloud over mathematics that needed to be resolved. 

To some the obvious solution was to ignore the existence of set theory. Some others 
thought that set theory had a disease that needed to be cured, but they were not quite sure 
where the trouble was. The naive notion of a general “set” seemed quite reasonable and in¬ 
nocent. When Cantor provided sets with a mathematical notation, they should have become 
mathematical objects capable of having theorems about them proven. All the theorems that 
dealt with finite sets appeared to be unchallengeable, yet there were definite problems with 
the acceptability of infinite sets. In other branches of mathematics the leap from the finite to 
the infinite can be made without violating intuitive notions. Calculus is full of infinite sums 
that act much the way finite sums do; for example, if we have an infinite sum of infinitesi¬ 
mals that add up to 3, when we double each term, the total will be 6. The Euclidean notion 
that the whole is the sum of its parts seems to carry over to infinite sets as well; for example, 
when the even integers are united with the odd integers, the result is the set of all integers. 
Yet, there was definitely an unsettling problem in that some of Cantor’s “theorems” gave 
contradictory results. 

In the year 1900, David Hilbert, as the greatest living mathematician, was invited to ad¬ 
dress an international congress to predict what problems would be important in the century 
to come. Either due to his influence alone, or as a result of his keen analysis, or as a tribute 
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to his gift for prophecy, for the most part he was completely correct. The 23 areas he indi¬ 
cated in that speech have turned out to be the major thrust of mathematics for the twentieth 
century. Although the invention of the computer itself was not one of his predictions, several 
of his topics turn out to be of seminal importance to computer science. 

First of all, he wanted the confusion in set theory resolved. He wanted a precise ax¬ 
iomatic system built for set theory that would parallel the one that Euclid had laid down for 
geometry. In Euclid’s classic texts, each true proposition is provided with a rigorous proof in 
which every line is either an axiom or follows from the axioms and previously proven theo¬ 
rems by a specified small set of rules of inference. Hilbert thought that such an axiom sys¬ 
tem and set of rules of inference could be developed to avoid the paradoxes Cantor (and oth¬ 
ers) had found in set theory. 

Second, Hilbert was not merely satisfied that every provable result should be true; he 
also presumed that every true result was provable. And even more significant, he wanted a 
methodology that would show mathematicians how to find this proof. He had in his mind a 
specific model of what he wanted. 

In the nineteenth century, mathematicians had completely resolved the question of solv¬ 
ing systems of linear equations. Given any algebraic problem having a specified number of 
linear equations, in a specified set of unknowns, with specified coefficients, a system had 
been developed (called linear algebra) that would guarantee one could decide weather the 
equations had any simultaneous solution at all, and find the solutions if they did exist. 

This would have been an even more satisfactory situation than existed in Euclidean 
geometry at the time. If we are presented with a correct Euclidean proposition relating line 
segments and angles in a certain diagram, we have no guidance as to how to proceed to pro¬ 
duce a mathematically rigorous proof of its truth. We have to be creative—we may make 
false starts, we may get completely lost, frustrated, or angry. We may never find the proof, 
even if many simple, short proofs exist. Linear algebra guarantees that none of this will ever 
happen with equations. As long as we are tireless and precise in following the rules, we must 
prevail, no matter how little imagination we ourselves possess. Notice how well this de¬ 
scribes the nature of a computer. Today, we might rephrase Hilbert’s request as a demand for 
a set of computer programs to solve mathematical problems. When we input the problem, 
the machine generates the proof. 

It was not easy for mathematicians to figure out how to follow Hilbert’s plan. Math¬ 
ematicians are usually in the business of creating the proofs themselves, not the proof-gener- 
ating techniques. What had to be invented was a whole field of mathematics that dealt with 
algorithms or procedures or programs (we use these words interchangeably). From this we 
see that even before the first computer was ever built, some people were asking the question 
of what programs can be written. It was necessary to codify the universal language in which 
algorithms could be stated. Addition and circumscribing circles were certainly allowable 
steps in an algorithm, but such activities as guessing and trying infinitely many possibilities 
at once were definitely prohibited. The language of algorithms that Hilbert required evolved 
in a natural way into the language of computer programs. 

The road to studying algorithms was not a smooth one. The first bump occurred in 1931 
when Kurt Godel proved that there was no algorithm to provide proofs for all the true state¬ 
ments in mathematics. In fact, what he proved was even worse. He showed that either there 
were some true statements in mathematics that had no proofs, in which case there were cer¬ 
tainly no algorithms that could provide these proofs, or else there were some false state¬ 
ments that did have proofs of their correctness, in which case the algorithm would be disas¬ 
trous. 

Mathematicians then had to retreat to the question of what statements do have proofs 
and how can we generate these proofs? The people who worked on this problem, Alonzo 
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Church, Stephen Kleene, Emil Post, Andrei Andreevich Markov, John von Neumann, and 
Alan Turing, worked mostly independently and came up with an extraordinarily simple 
set of building blocks that seemed to be the atoms from which all mathematical algo¬ 
rithms can be comprised. They each fashioned various (but similar) versions of a univer¬ 
sal model for all algorithms—what, from our perspective, we would call a universal al¬ 
gorithm machine. Turing then went one step farther. He proved that there were 
mathematically definable fundamental questions about the machine itself that the ma¬ 
chine could not answer. 

On the one hand, this theorem completely destroyed all hope of ever achieving any part 
of Hilbert’s program of mechanizing mathematics, or even of deciding which classes of 
problems had mechanical answers. On the other hand, Turing’s theoretical model for an al¬ 
gorithm machine employing a very simple set of mathematical structures held out the possi¬ 
bility that a physical model of Turing’s idea could actually be constructed. If some human 
could figure out an algorithm to solve a particular class of mathematical problem, then the 
machine could be told to follow the steps in the program and execute this exact sequence of 
instructions on any inserted set of data (tirelessly and with complete precision). 

The electronic discoveries that were needed for the implementation of such a device in¬ 
cluded vacuum tubes, which just coincidentally had been developed recently for engineering 
purposes completely unrelated to the possibility of building a calculating machine. This was 
another fortuitous phenomenon of this period of history. All that was required was the impe¬ 
tus for someone with a vast source of money to be motivated to invest in this highly specula¬ 
tive project. It is practically sacrilegious to maintain that World War II had a serendipitous 
impact on civilization no matter how unintentional, yet it was exactly in this way that the 
first computer was born—sponsored by the Allied military to break the German secret code, 
with Turing himself taking part in the construction of the machine. 

What started out as a mathematical theorem about mathematical theorems—an abstrac¬ 
tion about an abstraction—became the single most practically applied invention since the 
wheel and axle. Not only was this an ironic twist of fate, but it all happened within the re¬ 
markable span of 10 years. It was as incredible as if a mathematical proof of the existence of 
intelligent creatures in outer space were to provoke them to land immediately on Earth. 

Independently of all the work being done in mathematical logic, other fields of science 
and social science were beginning to develop mathematical models to describe and analyze 
difficult problems of their own. As we have noted before, there is a natural correspondence 
between the study of models of computation and the study of linguistics in an abstract and 
mathematical sense. It is also natural to assume that the study of thinking and learning — 
branches of psychology and neurology—play an important part in understanding and facili¬ 
tating computer theory. What is again of singular novelty is the historical fact that, rather 
than turning their attention to mathematical models to computerize their own applications, 
their initial development of mathematical models for aspects of their own science directly 
aided the evolution of the computer itself. It seems that half the intellectual forces in the 
world were leading to the invention of the computer, while the other half were producing ap¬ 
plications that were desperate for its arrival. 

Two neurophysiologists, Warren McCulloch and Walter Pitts, constructed a mathemati¬ 
cal model for the way in which sensory receptor organs in animals behave. The model they 
constructed for a “neural net” was a theoretical machine of the same nature as the one Turing 
invented, but with certain limitations. 

Modern linguists, some influenced by the prevalent trends in mathematical logic and 
some by the emerging theories of developmental psychology, had been investigating a very 
similar subject: What is language in general? How could primitive humans have developed 
language? How do people understand it? How do they learn it as children? What ideas can 
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be expressed, and in what ways? How do people construct sentences from the ideas in their 
minds? 

Noam Chomsky created the subject of mathematical models for the description of lan¬ 
guages to answer these questions. His theory grew to the point where it began to shed light 
on the study of computer languages. The languages humans invented to communicate with 
one another and the languages necessary for humans to communicate with machines shared 
many basic properties. Although we do not know exactly how humans understand language, 
we do know how machines digest what they are told. Thus, the formulations of mathematical 
logic became useful to linguistics, a previously nonmathematical subject. Metaphorically, 
we could say that the computer then took on linguistic abilities. It became a word processor, 
a translator, and an interpreter of simple grammar, as well as a compiler of computer lan¬ 
guages. The software invented to interpret programming languages was applied to human 
languages as well. One point that will be made clear in our studies is why computer lan¬ 
guages are easy for a computer to understand, whereas human languages are very difficult. 

Because of the many influences on its development, the subject of this book goes by 
various names. It includes three major fundamental areas: the theory of automata, the the¬ 
ory of formal languages, and the theory of Thring machines. This book is divided into 
three parts corresponding to these topics. 

Our subject is sometimes called computation theory rather than computer theory, be¬ 
cause the items that are central to it are the types of tasks (algorithms or programs) that can 
be performed, not the mechanical nature of the physical computer itself. However, the name 
“computation” is misleading, since it popularly connotes arithmetical operations which com¬ 
prise only a fraction of what computers can do. The term computation is inaccurate when de¬ 
scribing word processing, sorting, and searching and awkward in discussions of program 
verification. Just as the term “number theory” is not limited to a description of calligraphic 
displays of number systems but focuses on the question of which equations can be solved in 
integers, and the term “graph theory” does not include bar graphs, pie charts, and his¬ 
tograms, so too “computer theory” need not be limited to a description of physical machines 
but can focus on the question of which tasks are possible for which machines. 

We shall study different types of theoretical machines that are mathematical models for 
actual physical processes. By considering the possible inputs on which these machines can 
work, we can analyze their various strengths and weaknesses. We then arrive at what we 
may believe to be the most powerful machine possible. When we do, we shall be surprised to 
find tasks that even it cannot perform. This will be our ultimate result, that no matter what 
machine we build, there will always be questions that are simple to state that it cannot an¬ 
swer. Along the way, we shall begin to understand the concept of computability, which is 
the foundation of further research in this field. This is our goal. Computer theory extends 
further to such topics as complexity and verification, but these are beyond our intended 
scope. Even for the topics we do cover—automata, languages, Turing machines—much 
more is known than we present here. As intriguing and engaging as the field has proven so 
far, with any luck the most fascinating theorems are yet to be discovered. 


CHAPTER 2 


Languages 



LANGUAGES IN THE ABSTRACT 


In English we distinguish the three different entities: letters, words, and sentences. There is a 
certain parallelism between the fact that groups of letters make up words and the fact that 
groups of words make up sentences. Not all collections of letters form a valid word, and not 
all collections of words form a valid sentence. The analogy can be continued. Certain groups 
of sentences make up coherent paragraphs, certain groups of paragraphs make up coherent 
stories, and so on. What is more important to note is that, to a large degree, humans agree on 
which sequences are valid and which are not. How do they do that? 

This situation also exists with computer languages. Certain character strings are recog¬ 
nizable words (DO, IF, END . . . ). Certain strings of words are recognizable commands. 
Certain sets of commands become a program (with or without data) that can be compiled, 
which means translated into machine commands. 

To construct a general theory that unifies all these examples, it is necessary for us to 
adopt a definition of a “language structure,” that is, a structure in which the decision of 
whether a given string of units constitutes a valid larger unit is not a matter of guesswork, 
but is based on explicitly stated rules. For our purposes at this time, it is more important that 
there be rules for recognizing whether an input is a valid communication than rules for deci¬ 
phering exactly what the communication means. It is important that the program compiles 
whether or not it does what the programmer intended. If it compiles, it was a valid example 
of a statement or communication in the language and the machine is responsible for execut¬ 
ing the specified sequence of instructions. What we are looking for are ways of determining 
whether the input is a valid communication. Just as with any set, it is important for a lan¬ 
guage to be able to tell who is in and who is out. 

It is very hard to state all the rules for the language “spoken English,” since many seem¬ 
ingly incoherent strings of words are actually understandable utterances. This is due to 
slang, idiom, dialect, and our ability to interpret poetic metaphor and to correct unintentional 
grammatical errors in the sentences we hear. However, as a first step to defining a general 
theory of abstract languages, it is right for us to insist on precise rules, especially since com¬ 
puters are not quite as forgiving about imperfect input commands as listeners are about in¬ 
formal speech. 

When we call our study the theory of formal languages, the word “formal” refers to 
the fact that all the rules for the language are explicitly stated in terms of what strings of 
symbols can occur. No liberties are tolerated, and no reference to any “deeper understand- 
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$ LANGUAGES IN THE ABSTRACT 

In English we distinguish the three different entities: letters, words, and sentences. There is a 
certain parallelism between the fact that groups of letters make up words and the fact that 
groups of words make up sentences. Not all collections of letters form a valid word, and not 
all collections of words form a valid sentence. The analogy can be continued. Certain groups 
of sentences make up coherent paragraphs, certain groups of paragraphs make up coherent 
stories, and so on. What is more important to note is that, to a large degree, humans agree on 
which sequences are valid and which are not. How do they do that? 

This situation also exists with computer languages. Certain character strings are recog¬ 
nizable words (DO, IF, END . . . ). Certain strings of words are recognizable commands. 
Certain sets of commands become a program (with or without data) that can be compiled, 
which means translated into machine commands. 

To construct a general theory that unifies all these examples, it is necessary for us to 
adopt a definition of a “language structure,” that is, a structure in which the decision of 
whether a given string of units constitutes a valid larger unit is not a matter of guesswork, 
but is based on explicitly stated rules. For our purposes at this time, it is more important that 
there be rules for recognizing whether an input is a valid communication than rules for deci¬ 
phering exactly what the communication means. It is important that the program compiles 
whether or not it does what the programmer intended. If it compiles, it was a valid example 
of a statement or communication in the language and the machine is responsible for execut¬ 
ing the specified sequence of instructions. What we are looking for are ways of determining 
whether the input is a valid communication. Just as with any set, it is important for a lan¬ 
guage to be able to tell who is in and who is out. 

It is very hard to state all the rules for the language “spoken English,” since many seem¬ 
ingly incoherent strings of words are actually understandable utterances. This is due to 
slang, idiom, dialect, and our ability to interpret poetic metaphor and to correct unintentional 
grammatical errors in the sentences we hear. However, as a first step to defining a general 
theory of abstract languages, it is right for us to insist on precise rules, especially since com¬ 
puters are not quite as forgiving about imperfect input commands as listeners are about in¬ 
formal speech. 

When we call our study the theory of formal languages, the word “formal” refers to 
the fact that all the rules for the language are explicitly stated in terms of what strings of 
symbols can occur. No liberties are tolerated, and no reference to any “deeper understand- 
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ing” is required. Language will be considered solely as symbols on paper and not as expres¬ 
sions of ideas in the minds of humans. In this basic model, language is not communication 
among intellects, but a game of symbols with formal rules. The term “formal” used here em¬ 
phasizes that it is the form of the string of symbols we are interested in, not the meaning. 

We begin with only one finite set of fundamental units out of which we build structures. 
We shall call this the alphabet. A certain specified set of strings of characters from the al¬ 
phabet will be called the language. Those strings that are permissible in the language we 
call words. The symbols in the alphabet do not have to be Latin letters, and the sole univer¬ 
sal requirement for a possible string is that it contains only finitely many symbols. The ques¬ 
tion of what it means to “specify” a set of strings is, in reality, the major issue of this book. 

We shall wish to allow a string to have no letters. This we call the empty string or null 
string, and we shall denote it by the symbol A. No matter what “alphabet” we are consider¬ 
ing, the null string is always A and for all languages the null word, if it is a word in the lan¬ 
guage, is also A. Two words are considered the same if all their letters are the same and in 
the same order, so there is only one possible word of no letters. For clarity, we usually do not 
allow the symbol A to be part of the alphabet for any language. 

There is a subtle but important difference between the word that has no letters, A, and 
the language that has no words. We shall denote the language that has no words by the stan¬ 
dard symbol for the null set, 4>. It is not true that A is a word in the language 4> since this 
language has no words at all. If a certain language L does not contain the word A and we 
wish to add it to L, we use the “union of sets” operation denoted by “+” to form L + {A}. 
This language is not the same as L. On the other hand, L + 4> is the same as L since no new 
words have been added. 

The fact that 4> is a language even though it has no words will turn out to be an impor¬ 
tant distinction. If we have a method for producing a language and in a certain instance the 
method produces nothing, we can say either that the method failed miserably, or that it suc¬ 
cessfully produced the language 4>. We shall face just such a situation later. 

The most familiar example of a language for us is English. The alphabet is the usual set 
of letters plus the apostrophe and hyphen. Let us denote the whole alphabet by the Greek let¬ 
ter capital sigma: 

' %={abcde...z'-\ 

It is customary to use this symbol to denote whichever collection of letters form the alphabet 
for the words in the language L. This is not because the Greek word for “alphabet” starts 
with the letter sigma—the Greek word for alphabet is alphabetor and starts with an A. How¬ 
ever, this subject started as a branch of mathematics well before computers and desktop pub¬ 
lishing, and when researchers were looking for a symbol less ambiguous than A to denote al¬ 
phabet, they employed the special characters already found in mathematical printing: X and 
T as well as 4> and A for other purposes. This has become a time-honored tradition. To some 
it makes computer theory seem more mathematical and to some this is an advantage. Our in¬ 
vestigations will be completely mathematical with as little resort to irrelevant symbolic com¬ 
plexity as possible. 

Sometimes, we shall list a set of elements separated by spaces and sometimes by com¬ 
mas. If we wished to be supermeticulous, we would also include in X the uppercase letters 
and the seldom used diacritical marks. 

We can now specify which strings of these letters are valid words in our language by list¬ 
ing them all, as is done in a dictionary. It is a long list, but a finite list, and it makes a perfectly 
good definition of the language. If we call this language ENGLISH-WORDS, we may write 


ENGHSH-WORDS = {all the words in a standard dictionary} 
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In the preceding line, we have intentionally mixed mathematical notation (the equal sign 
and the braces denoting a set) and a prose phrase. This results in perfectly understandable 
communication; we take this liberty throughout. All of our investigations will be agglomer¬ 
ates of informal discussion and precise symbolism. Mathematical symbolism is of value 
only when it is somehow better than seeing the same thought expressed in human language, 
for example, when it is more understandable, or more concise in cases where space is a 
problem, or when it points out similarities between items whose resemblance is otherwise 
obscure, and so on. The belief that mathematical symbolism is more rigorous and therefore 
more accurate than English prose is quite ridiculous since every mathematical symbol was 
defined in English in the first place and every mathematical formula can be translated into 
English if need be. There are two problems with pure mathematical symbolism: It alienates 
some who for want of familiarity could otherwise understand the concepts being expressed, 
and it often gives one a false sense of precision—many, many false proofs have been pub¬ 
lished in mathematics journals because their notation was so opaque that it confused the edi¬ 
tors. Since the goal in a textbook is not to minimize the space required to explain concepts 
but to maximize the chance of understanding, we shall find little use for complex sym¬ 
bolism. 

Only a language with finitely many words can be defined by an all-inclusive list called a 
dictionary. If we tried to define a language of infinitely many words by an infinite list, we 
would arrive at the problem of the impossibility of searching this list (even if it is arranged 
in alphabetical order) to determine whether a given word is in the language or not. But even 
though there are tricks to overcome the searching problem (as we shall soon see), we do not 
allow the possibility of defining a language by an infinite dictionary. How could we be 
handed an infinite dictionary? It would have to be described to us in some manner, but then 
the description and not the dictionary would be the language definition. 

Returning to the language of ENGLISH-WORDS, we note that this is not what we usu¬ 
ally mean by “English ” To know all the words in a finite language like English does not im¬ 
ply the ability to create a viable sentence. 

Of course, the language ENGLISH-WORDS, as we have specified it, does not have any 
grammar. If we wish to make a formal definition of the language of the sentences in English, 
we must begin by saying that this time our basic alphabet is the entries in the dictionary. Let 
us call this alphabet T, the capital gamma: 

T = {the entries in a standard dictionary, plus a blank space, plus the 
usual punctuation marks} 

In order to specify which strings of elements from F produce valid words in the lan¬ 
guage ENGLISH-SENTENCES, we must rely on the grammatical rules of English. This is 
because we could never produce a complete list of all possible words in this language; that 
would have to be a list of all valid English sentences. Theoretically, there are infinitely many 
different words in the language ENGLISH-SENTENCES. For example, 

I ate one apple. 

I ate two apples. 

I ate three apples. 


The trick of defining the language ENGLISH-SENTENCES by listing all the rules of 
English grammar allows us to give a finite description of an infinite language. 

If we go by the rules of grammar only, many strings of alphabet letters seem to be valid 
words; for example, “I ate three Tuesdays.” In a formal language we must allow this string. It 
is grammatically correct; only its meaning reveals that it is ridiculous. Meaning is something 
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we do not refer to in formal languages. As we make clear in Part II of this book, we are pri¬ 
marily interested in syntax alone, not semantics or diction. We shall be like the bad teacher 
who is interested only in the correct spelling, not the ideas in a homework composition. 

In general, the abstract languages we treat will be defined in one of two ways. Either 
they will be presented as an alphabet and the exhaustive list of all valid words, or else they 
will be presented as an alphabet and a set of rules defining the acceptable words. The set of 
rules defining English is a grammar in a very precise sense. We shall take a much more lib¬ 
eral view about what kinds of “sets of rules” define languages. 

Earlier we mentioned that we could define a language by presenting the alphabet and 
then specifying which strings are words. The word “specify” is trickier than we may at first 
suppose. Consider this example of the language called MY-PET. The alphabet for this lan¬ 
guage is 

[a c d g o t) 

There is only one word in this language, and for our own perverse reasons we wish to 
specify it by this sentence: 

If the Earth and the Moon ever collide, then 

MY-PET - {cat) 

but, if the Earth and the Moon never collide, then 
MY-PET = {dog} 

One or the other of these two events will occur, but at this point in the history of the uni¬ 
verse, it is impossible to be certain whether the word dog is or is not in the language MY- 
PET. 

This sentence is therefore not an adequate specification of the language MY-PET be¬ 
cause it is not useful. To be an acceptable specification of a language, a set of rules must en¬ 
able us to decide, in a finite amount of time, whether a given string of alphabet letters is or is 
not a word in the language. Notice also that we never made it d requirement that all the let¬ 
ters in the alphabet need to appear in the words selected for the language. English itself used 
to have a letter called “eth” that has thankfully disappeared. We could add it back to the al¬ 
phabet of letters and leave the language ENGLISH-WORDS unchanged. 

$ INTRODUCTION TO DEFINING LANGUAGES 

The set of language-defining rules can be of two kinds. They can either tell us how to test a 
string of alphabet letters that we might be presented with, to see if it is a valid word, or they 
can tell us how to construct all the words in the language by some clear procedures. We in¬ 
vestigate this distinction further in the next chapter. 

Let us consider some simple examples of languages. If we start with an alphabet having 
only one letter, the letter x , 

S=w 

we can define a language by saying that any nonempty string of alphabet characters is a 
word: 

Lj = [x xx xxx xxxx . . .} 

We could write this in an alternate form: 


Lj = (Y 1 for n— 1 2 3 . . .} 
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where we have identified letter juxtaposition with algebraic multiplication. We shall see that 
this is sometimes a messy business. 

Because of the way we have defined it, this language does not include the null string. 
We could have defined it so as to include A, but we did not. 

In this language, as in any other, we can define the operation of concatenation, in 
which two strings are written down side by side to form a new longer string. In this example, 
when we concatenate the word xxx with the word xx, we obtain the word xxxxx. The words 
in this language are clearly analogous to the positive integers, and the operation of concate¬ 
nation is analogous to addition: 

x” concatenated with a 7 ” is the word x” + m 

It will often be convenient for us to designate the words in a given language by new sym¬ 
bols, that is, other than the ones in the alphabet. For example, we could say that the word xxx 
is called a and that the word xx is b. Then to denote the word formed by concatenating a and 
b, we write the letters side by side: 

ab — xxxxx 

It is not always true that when two words are concatenated they produce another word 
in the language. For example, if the language is 

L 2 = [x xxx xxxxx xxxxxxx . . .} 

= {jc° dd } 

= {jt 2 " +1 for n = 0 1 2 3...} 

then a = xxx and b = xxxxx are both words in L v but their concatenation ab = xxxxxxxx is 
not in L t Notice that the alphabet for L 2 is the same as the alphabet for L y Notice also the 
liberty we took with the middle definition. 

In these simple examples, when we concatenate a with b, we get the same word as when 
we concatenate b with a. We can depict this by writing 

ab — ba 

But this relationship does not hold for all languages. In English when we concatenate 
“house” and “boat,” we get “houseboat,” which is indeed a word but distinct from 
“boathouse,” which is a different thing—not because they have different meanings, but be¬ 
cause they are different words. “Merry-go-round” and “carousel” mean the same thing, but 
they are different words. 

EXAMPLE 

Consider another language. Let us begin with the alphabet: 

1 2 3 4 5 6 7 8 9} 

and define the set of words: 

L 3 — {any finite string of alphabet letters that does not start with the letter zero} 

This language L 3 then looks like the set of all positive integers written in base 10: 

L 3 = [1 2 3 4 5 6 7 8 9 10 11 12 . . .} 

We say “looks like” instead of “is” because L 3 is only a formal collection of strings of sym¬ 

bols. The integers have other mathematical properties. If we wanted to define the language 
L 3 so that it includes the string (word) 0, we could say: 
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L 3 = {any finite string of alphabet letters that, if it starts with a 0, has no 

more letters after the first} ■ 

The box, ■, that ends the line above is an end marker. When we present an example of 
a point in the text, we shall introduce it with the heading: 

EXAMPLE 

and finish it with an end marker ■. This will allow us to keep the general discussion separate 
from the specific examples. We shall use the same end marker to denote the end of a defini¬ 
tion or a proof. 

DEFINITION 


PROOF 


The old-fashioned end marker denoting that a proof is finished is Q.E.D. This box serves the 
same purpose. 


DEFINITION 

We define the function length of a string to be the number of letters in the string. We write 
this function using the word “length.” For example, if a = xxxx in the language L p then 

length(a) = 4 


If c = 428 in the language L v then 


Or we could write directly that in L. 


and in L, 


length(c) = 3 


length(xxxe) = 4 


length(428) = 3 

In any language that includes the empty string A, we have 

length(A) = 0 

For any word w in any language, if length(w) = 0, then w = A. 

We can now present yet another definition of L y 

L 3 = {any finite string of alphabet letters that, if it has 
length more than 1, does not start with a 0} 
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This is not necessarily a better definition of L v but it does illustrate that there are often dif¬ 
ferent ways of specifying the same language. 

There is some inherent ambiguity in the phrase “any finite string,” since it is not clear 
whether we intend to include the null string (A, the string of no letters). To avoid this ambi¬ 
guity, we shall always be more careful. The language L 3 does not include A, since we in¬ 
tended that that language should look like the integers, and there is no such thing as an inte¬ 
ger with no digits. On the other hand, we may wish to define a language like L, but that does 
contain A: 

L 4 = {A x xx xxx xxxx . . .} 

= {x" for n = 0 1 2 3. . .} 

Here we have said that jc° — A, not jc° = 1 as in algebra. In this way, x* is always the string of 
n jc’s. This may seem like belaboring a trivial point, but the significance of being careful 
about this distinction will emerge over and over again. 

In L 3 it is very important not to confuse 0, which is a string of length 1, with A. Re¬ 
member, even when A is a word in the language, it is not a letter in the alphabet. 

DEFINITION 

Let us introduce the function reverse. If a is a word in some language L, then reverse(a) is 
the same string of letters spelled backward, called the reverse of a, even if this backward 
string is not a word in L. ■ 

EXAMPLE 

reverse(xu:) = xxx 
reverse(xxux) = xxxxx 
reverse(145) = 541 

But let us also note that in L 3 

reverse(140) = 041 

which is not a word in L y ■ 

DEFINITION 

Let us define a new language called PALINDROME over the alphabet 

2={a b) 

PALINDROME = {A, and all strings x such that reverse(jc) = jc} ■ 

If we begin listing the elements in PALINDROME, we find 

PALINDROME = {A a b aa bb aaa aba bab bbb aaaa abba. . .} 

The language PALINDROME has interesting properties that we shall examine later. 
Sometimes, when we concatenate two words in PALINDROME, we obtain another 
word in PALINDROME such as when abba is concatenated with abbaabba . More of¬ 
ten, the concatenation is not itself a word in PALINDROME, as when aa is concate¬ 
nated with aba. Discovering when this does happen is left as a problem at the end of this 
chapter. 
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4 KLEENE CLOSURE 

DEFINITION 

Given an alphabet X, we wish to define a language in which any string of letters from X is a 
word, even the null string. This language we shall call the closure of the alphabet. It is de¬ 
noted by writing a star (an asterisk) after the name of the alphabet as a superscript: 

X* 

This notation is sometimes known as the Kleene star after the logician who was one of the 
founders of this subject. ■ 

EXAMPLE 

If X = {*}, then 

X* = L a = {A x xx xxx . . .} ■ 

EXAMPLE 

If X = (0 1}, then 

X*=(A 0 1 00 01 10 11 000 001...} ■ 

EXAMPLE 

If X = [a b c},then 

X* = {A a b c aa ab ac ba bb be ca cb cc aaa . . .} ■ 

We can think of the Kleene star as an operation that makes an infinite language of 
strings of letters out of an alphabet. When we say “infinite language,” we mean infinitely 
many words, each of finite length. 

Notice that when we wrote out the first several words in the language, we put them in 
size order (words of shortest length first) and then listed all the words of the same length al¬ 
phabetically. We shall usually follow this method of sequencing a language. This ordering is 
called lexicographic order. In a dictionary, the word aardvark comes before cat', in lexico¬ 
graphic ordering it is the other way. Whereas both orderings are useful for the problem of 
searching for a given word, in the list for infinite sets lexicographic ordering has some dis¬ 
tinct advantages. In the language just above, there are infinitely many words that start with 
the letter a and they all come alphabetically before the letter b. When listed in the usual al¬ 
phabetical order, the first five words of this language are A- a- aa- aaa- aaaa and the 
three-dot ellipsis . .” would not inform us of the real nature of the language. 

We shall now generalize the use of the star operator to sets of words, not just sets of al¬ 
phabet letters. 

DEFINITION 


If S is a set of words, then by S* we mean the set of all finite strings formed by concatenat¬ 
ing words from S, where any word may be used as often as we like, and where the null string 
is also included. ■ 
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Let us not make the mistake of confusing the two languages 

ENGLISH-WORDS* and ENGLISH-SENTENCES 

The first language contains the word butterbutterbutterhat, whereas the second does not. This is 
because words in ENGLISH-WORDS* are the concatenate of arbitrarily many words from 
ENGLISH-WORDS, while words in ENGLISH-SENTENCES are restricted to juxtaposing 
only words from ENGLISH-WORDS in an order that complies with the rules of grammar. 

EXAMPLE 

If S = {aa b\, then 

S* = {A plus any word composed of factors of aa and b\ 

= (A plus all strings of a' s and b' s in which the a’ s occur in even clumps} 

= {A b aa bb aab baa bbb aaaa aabb baab bbaa bbbb 
aaaab aabaa aabbb baaaa baabb bbaab bbbaa bbbbb . . .} 

The string aabaaab is not in S* since it has a clump of a’ s of length 3. The phrase “clump of 
a' s” has not been precisely defined, but we know what it means anyway. ■ 

EXAMPLE 

LetS={n ab }. Then 

S* — {A plus any word composed of factors of a and ab } 

= {A plus all strings of a' s and b' s except those that start with b and 
those that contain a double b) 

= {A a aa ab aaa aab aba aaaa aaab aaba abaa abab aaaaa 
aaaab aaaba aabaa aabab abaaa abaab ababa . . .} 

By the phrase “double bfi we mean the substring bb. For each word in 5* every b must 
have an a immediately to its left. The substring bb is impossible, as is starting with a b. Any 
string without the substring bb that begins with an a can be factored into terms of (ab) and (a). 

The middle definition of this language is not an obvious consequence of the definition 
of *, but it can be deduced in this case. ■ 

To prove that a certain word is in the closure language S*, we must show how it can be 
written as a concatenate of words from the base set S. 

In the last example, to show that abaab is in S*, we can factor it as follows: 

(ab)(a)(ab) 

These three factors are all in the set 5; therefore, their concatenation is in S*. This is the only 
way to factor this string into factors of (a) and (ab). When this happens, we say that the fac¬ 
toring is unique. 

Sometimes, the factoring is not unique. For example, consider S = {xx xxx} . Then 

S* — {A and all strings of more than one x} 

= {x" for n — 0 2 3 4 5. . .} 

= {A xx xxx xxxx xxxxx xxxxxx . . .} 

Notice that the word x is not in the language S*. The string xxxxxxx is in this closure for 
any of these three reasons. It is 

(xr)(xx)(xxx) or (xx)(xxx)(xx) or (xxx)(xx)(xx) 

Also, x 6 is either x 2 x 2 x 2 or else x-V. 
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It is important to note here that the parentheses, ( ), are not letters in the alphabet, but 
are used for the sole purpose of demarcating the ends of factors. So, we can write 
xxxxx = (xx)(xxx). In cases where parentheses are letters of the alphabet, 

2 = {jc( )} 
length(xxxxx) = 5 
but length((xx)(xxx)) = 9 

Let us suppose that we wanted to prove mathematically that this set S* contains all x* 
for n 7^ 1. Suppose that somebody did not believe this and needed convincing. We could pro¬ 
ceed as follows. 

First, we consider the possibility that there were some powers of x that we could not 
produce by concatenating factors of (xx) and (xxx). 

Obviously, since we can produce x 4 , x 5 , x 6 , the examples of strings that we cannot pro¬ 
duce must be large. Let us ask the question, “What is the smallest power of * (larger than 1) 
that we cannot form out of factors of xx and xxx ?” Let us suppose that we start making a list 
of how to construct the various powers of x. On this list we write down how to form x 2 , x 3 , 
x 4 , x 5 , and so on. Let us say that we work our way successfully up to x 373 , but then we cannot 
figure out how to form x 374 . We become stuck, so a friend comes over to us and says, “Let 
me see your list. How did you form the word x 372 ? Why don’t you just concatenate another 
factor of xx in front of this and then you will have the word x 374 that you wanted.” Our friend 
is right, and this story shows that while writing this list out, we can never really become 
stuck. This discussion can easily be generalized into a mathematical proof of the fact that S* 
contains all powers of x greater than 1. 

We have just established a mathematical fact by a method of proof that we have rarely 
seen in other courses. It is a proof based on showing that something exists (the factoring) be¬ 
cause we can describe how to create it (by adding xx to a previous case). What we have de¬ 
scribed can be formalized into an algorithm for producing all the powers of x from the fac¬ 
tors xx and xxx. The method is to begin with xx and xxx and, when we want to produce x", we 
take the sequence of concatenations that we have already found will produce x" - 2 , and we 
concatenate xx onto that. 

The method of proving that something exists by showing how to create it is called proof 
by constructive algorithm. This is the most important tool in our whole study. Most of the 
theorems in this book will be proven by the method of constructive algorithm. It is, in gen¬ 
eral, a very satisfying and useful method of proof, that is, provided that anybody is interested 
in the objects we are constructing. We may have a difficult time selling powers of x broken 
into factors of xx and xxx. 

Let us observe that if the alphabet has no letters, then its closure is the language with the 
null string as its only word, because A is always a word in a Kleene closure. Symbolically, 
we write 

If X = 0 (the empty set), 
then 2* = {A} 

This is not the same as 

If 5= {A}, 
then S* = {A} 

which is also true but for a different reason, that is, AA — A. 

The Kleene closure always produces an infinite language unless the underlying set was 
one of the two examples above. Unless we insist on calling Kleene closure a very forgiving 
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rule of grammar (anything goes), we have introduced a new method for defining languages 
that works only for infinite languages. 

The Kleene closure of two sets can end up being the same language even if the two sets 
that we started with were not. 

EXAMPLE 

Consider the two languages 

S = { a b ab } and T = { a b bb\ 

Then both S* and T* are languages of all strings of a's and b' s since any string of a' s and b' s 
can be factored into syllables of either (a) or (b), both of which are in S and T. ■ 

If for some reason we wish to modify the concept of closure to refer to only the con¬ 
catenation of some (not zero) strings from a set S, we use the notation + instead of *. For 
example, 

If£={x}, then X + = {x xx xrx...} 

which is the language L, that we discussed before. 

If S is a set of strings not including A, then S + is the language S* without the word A. 
Likewise, if T is a set of letters, then T + means the same as T*, except that it can never mean 
A. If S is a language that does contain A, then S + =5*. 

This “plus operation” is sometimes called positive closure. 

If S = {xx xxx}, then S + is the same as S* except for the word A, which is not in S + . 
This is not to say that S + cannot, in general, contain the word A. It can, but only on the con¬ 
dition that S contains the word A initially. In this case, A is in S + , since it is the concatena¬ 
tion of some (actually one) word from S (A itself). Anyone who does not think that the null 
string is confusing has missed something. It is already a problem, and it gets worse later. 

EXAMPLE 

If S is the set of three words 

5={w, w 2 w 3 } 

then 

5 + = {w, W 2 W 3 WjW, WjW 2 U',w 3 W 2 vv, W 2 W 2 W 2 vv 3 

W 3 W, W 3 W 2 W 3 W 3 WjWjWj W X W X W 2 . . .} 

no matter what the words vv ( , w 2 , and n> 3 are. 

If w, = aa, w 2 = bbb , w 3 = A, then S + = [aa bbb A aaaa aabbb . . .} 

The words in the set S are listed above in the order corresponding to their w-sequencing, 
not in the usual lexicographic or size-alphabetical order. ■ 

What happens if we apply the closure operator twice? We start with a set of words S and 
look at its closure S*. Now suppose we start with the set S* and try to form its closure, 
which we denote as 
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If S is not the trivial empty set or the set consisting solely of A, then S* is infinite, so we are 
taking the closure of an infinite set. This should present no problem since every string in the 
closure of a set is a combination of only finitely many words from the set. Even if the set S 
has infinitely many words, we use only finitely many at a time. This is the same as with ordi¬ 
nary arithmetic expressions, which can be made up of only finitely many numbers at a time 
even though there are infinitely many numbers to choose from. 

From now on we shall let the closure operator apply to infinite sets as well as finite sets. 

THEOREM 1 

For any set S of strings we have S* = 5**. 

CONVINCING REMARKS 

First, let us illustrate what this theorem means. Say, for example, that S = {a b }. Then 5* 
is clearly all strings of the two letters a and b of any finite length whatsoever. Now what 
would it mean to take strings from S* and concatenate them? Let us say we concatenated 
(aaba) and ( baaa ) and (< aaba ). The end result (aababaaaaaba ) is no more than a concatena¬ 
tion of the letters a and b, just as with all elements of S*. 

aababaaaaaba 

= ( aaba)(baaa)(aaba) 

= m(am(a) i mma)m 

= (a)(a)(b)(a)(b)(a)(a)(a)(a)(a)(b)(a) 

Let us consider one more illustration. If S = {aa bbb}, then S* is the set of all strings 
where the a’s occur in even clumps and the b’s in groups of 3,6,9 . . . . Some words in S* are 

aabbbaaaa bbb bbbaa 

If we concatenate these three elements of S*, we get one big word in 5**, which is again in 5*. 

aabbbaaaabbbbbbaa 
= [(aa)(bbb)(aa)(aa)] [(bbb)] [(bbb)(aa)\ 

This theorem expresses a trivial but subtle point. It is analogous to saying that if people are 
made up of molecules and molecules are made up of atoms, then people are made up of 
atoms. 

PROOF 

Every word in 5** is made up of factors from S*. Every factor from S* is made up of factors 
from S. Therefore, every word in S** is made up of factors from S. Therefore, every word in 
5** is also a word in S*. We can write this as 

s** c s* 

using the symbol “C” from set theory, which means “is contained in or equal to.” 

Now, in general, it is true that for any set A we know that A C A*, since in A* we can 
choose as a word any one factor from A. So if we consider A to be our set S*, we have 

S* C S** 

Together, these two inclusions prove that 


S* = $** 


Problems 
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PROBLEMS 

( 1. ponsider the language S* , where S — {a b}. 

'™^How many words does this language have of length 2? of length 3? of length n? 

2. Consider the language S*, where S = { aa b }. 

How many words does this language have of length 4? of length 5? of length 6? What 
can be said in general? 

3. Consider the language S*, where S = \ab ba}. Write out all the words in 5* that have 
seven or fewer letters. Can any word in this language contain the substrings aaa or bbb? 
What is the smallest word that is not in this language? 

4. Consider the language S'*, where S = [a ab ba }. Is the string (abbba) a word in this 
language? Write out all the words in this language with six or fewer letters. What is an¬ 
other way in which to describe the words in this language? Be careful, this is not simply 
the language of all words without bbb. 

5. Consider the language 5*, where 5 = [aa aba baa}. Show that the words aabaa, 
baaabaaa, and baaaaababaaaa are all in this language. Can any word in this language 
be interpreted as a string of elements from S in two different ways? Can any word in this 
language have an odd total number of a’s? 

6. Consider the language S*, where S = [xx xxx}. In how many ways can x 19 be written 
as the product of words in 5? This means: How many different factorizations are there 
of jc 19 into xc and xxx? 

. 7. Consider the language PALINDROME over the alphabet { a b}. 

’ w ' (i) Prove that if x is in PALINDROME, then so is x" for any n. 

(ii) Prove that if y 3 is in PALINDROME, then so is y. 

(iii) Prove that if z" is in PALINDROME for some n (greater than 0), then z itself is 
also. 

(iv) Prove that PALINDROME has as many words of length 4 as it does of length 3. 

(v) Prove that PALINDROME has as many words of length 2 n as it has of length 
2n — 1. How many words is that? 

8. Show that if the concatenation of two words (neither A) in PALINDROME is also a 
word in PALINDROME, then both words are powers of some other word; that is, if x 
and y and xy are all in PALINDROME, then there is a word z such that x = z" and y = f 1 
for some integers n and m (maybe n or m = 1). 

9. (i) Let S = {ab bb } and let T = {ab bb bbbb }. Show that S* = T*. 

(ii) Let S — [ab bb} and let T — [ab bb bbb}. Show that S*i£T*, but that 
S* C T*. 

(iii) What principle does this illustrate? 

10. How does the situation in Problem 9 change if we replace the operator * with the opera¬ 
tor + as defined in this chapter? Note the language S~ means the same as 5*, but does 
not allow the “concatenation of no words” of S. 

11. Prove that for all sets S, 

(i) (S + )* = (S*)* 

(ii) (S + ) + = 

(iii) Is ( S*) + = (, S + )* for all sets 5? 
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12. Let S — {a bb bab abaab }. Is abbabaabab in S*? Is abaabbabbaabbl Does any 
word in S* have an odd total number of b'sl 

13. Suppose that for some language L we can always concatenate two words in L and get 
another word in L if and only if the words are not the same. That is, for any words w, 
and w 2 in L where vv l w 2 , the word w { w 2 is in L but the word w l w l is not in L. Prove 
that this cannot happen. 

14. Let us define 

__ £*** 

Is this set bigger than S*? Is it bigger than SI 

15. Let w be a string of letters and let the language T be defined as adding w to the language 
S. Suppose further that T* = S*. 

(i) Is it necessarily true that w E S? 

(ii) Is it necessarily true that w E S*? 

16. Give an example of a set S such that the language S* has more six-letter words than 
seven-letter words. Give an example of an S* that has more six-letter words than eight- 
letter words. Does there exist an S* such that it has more six-letter words than twelve- 
letter words? 

17. (i) Consider the languages*, where S — {aa ab ba bb}. Give another description 

of this language. 

(ii) Give an example of a set S such that S * only contains all possible strings of ur’s and 
b' s that have length divisible by 3. 

(iii) Let 5 be all strings of a' s and b's with odd length. What is 5*? 

18. (i) If S = [a b\ and T* = 5*, prove that T must contain S. 

(ii) Find another pair of sets S and T such that if T* = S*, then S CT, 

19. One student suggested the following algorithm to test a string of a's and b's to see if it is 
a word in S*, where S - [aa ba aba abaab). Step I, cross off the longest set of 
characters from the front of the string that is a word in S. Step 2, repeat step 1 until it is 
no longer possible. If what remains is the string A, the original string was a word in S*. 
If what remains is not A (this means some letters are left, but we cannot find a word in S 
at the beginning), the original string was not a word in S*. Find a string that disproves 
this algorithm. 

20. A language L { is smaller than another language L 2 if L, C L 2 and L, # L v Let T be any 
language closed under concatenation; that is, if t { E T and t 2 E T, then t { t 2 is also an ele¬ 
ment of T. Show that if T contains S but T # S*, then S* is smaller than T. We can sum¬ 
marize this by saying that S* is the smallest closed language containing S. 



A NEW METHOD FOR DEFINING LANGUAGES 


One of the mathematical tools that we shall find extremely useful in our study, but which is 
largely unfamiliar in other branches of mathematics, is a method of defining sets called re¬ 
cursive definition. A recursive definition is characteristically a three-step process. First, we 
specify some basic objects in the set. Second, we give rules for constructing more objects in 
the set from the ones we already know. Third, we declare that no objects except those con¬ 
structed in this way are allowed in the set. 

Let us take an example. Suppose that we are trying to define the set of positive even in¬ 
tegers for someone who knows about arithmetic, but has never heard of the even numbers. 
One standard way of defining this set is 

EVEN is the set of all positive whole numbers divisible by 2. 

Another way we might try is this: 

EVEN is the set of all 2 n where n = 1 2 3 4 ... . 

The third method we present is sneaky, by recursive definition: 

The set EVEN is defined by these three rules: 

Rule 1 2 is in EVEN. 

Rule 2 If x is in EVEN, then so is x + 2. 

Rule 3 The only elements in the set EVEN are those that can be produced from the 
two rules above. 

The last rule above is completely redundant. We state it this once only for pedagogical rea¬ 
sons, but it is tacitly presumed in all recursive definitions. 

There is a reason that the third definition is less popular than the others: It is much 
harder to use in most practical applications. 

For example, suppose that we wanted to prove that 14 is in the set EVEN. To show this 
using the first definition, we divide 14 by 2 and find that there is no remainder. Therefore, it 
is in EVEN. To prove that 14 is in EVEN by the second definition, we have to somehow 
come up with the number 7 and then, since 14 = (2)(7), we know that it is in EVEN. To 
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prove that 14 is in EVEN using the recursive definition is a lengthier process. We could pro¬ 
ceed as below: 

By Rule 1, we know that 2 is in EVEN. 

Then by Rule 2, we know that 2 + 2 = 4 is also in EVEN. 

Again by Rule 2, we know that since 4 has just been shown to be in EVEN, 4 + 2 = 6 is 
also in EVEN. 

The fact that 6 is in EVEN means that when we apply Rule 2, we deduce that 6 + 2-8 
is in EVEN, too. 

Now applying Rule 2 to 8, we derive that 8 + 2 = 10 is another member of EVEN. 

Once more applying Rule 2, this time to 10, we infer that 10 + 2 = 12 is in EVEN. 

And, at last, by applying Rule 2, yet again, to the number 12, we conclude that 
12 + 2 = 14 is, indeed, in EVEN. 

Pretty horrible. This, however, is not the only recursive definition of the set EVEN. We \ 
might use: 

The set EVEN is defined by these two rules: 

Rule 1 2 is in EVEN. 

Rule 2 If x and y are both in EVEN, then so is 

x + y 

It should be understood that we can apply Rule 2 also to the case where x and y stand for the 
same number. 

We can now prove that 14 is in EVEN in fewer steps: 

By Rule 1 2 is in EVEN. 

By Rule 2 x = 2, y = 2 —» 4 is in EVEN. 

By Rule 2 x = 2, y = 4 —* 6 is in EVEN. 

By Rule 2 x = 4, y - 4 —* 8 is in EVEN. 

By Rule 2 x — 6, y = 8 —*» 14 is in EVEN. 

This is a better recursive definition of the set EVEN because it produces shorter proofs 
that elements are in EVEN. The set EVEN, as we have seen, has some very fine definitions 
that are not recursive. In later chapters, we shall be interested in certain sets that have no bet¬ 
ter definition than the recursive one. 

Before leaving this example, let us note that although the second recursive definition is 
still harder to use (in proving that given numbers are even) than the two nonrecursive defini¬ 
tions, it does have some advantages. For instance, suppose we want to prove that the sum of 
two numbers in EVEN is also a number in EVEN. This is a trivial conclusion from the sec¬ 
ond recursive definition, but to prove this from the first definition is decidedly harder. 
Whether or not we want a recursive definition depends on two things: one, how easy the 
other possible definitions are to understand; and two, what types of theorems we may wish 
to prove about the set. 

EXAMPLE 

The following is a recursive definition of the positive integers: 

Rule 1 1 is in INTEGERS. 

Rule 2 If x is in INTEGERS, then so is x + 1. 
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If we wanted the set INTEGERS to be defined to include both the positive and negative inte¬ 
gers, we might use the following recursive definition: 

Rule 1 1 is in INTEGERS. 

Rule 2 If both x and y are in INTEGERS, then so are x + y and x — y. 

Since 1-1=0 and, for all positive x, 0 - x = -x, we see that the negative integers and 
zero are all included in this definition. ■ 


EXAMPLE 

If we wanted a recursive definition for all the positive real numbers, we could try a definition 
of the form: 

Rule 1 x is in POSITIVE. 

Rule 2 If x and y are in POSITIVE, then so are x + y and xy. 

But the problem is that there is no smallest positive real number x on which to build the rest 
of the set. We could try: 

Rule 1 If x is in INTEGERS, is a decimal point, and y is any finite string of digits, 
even one that starts with some zeros, then x.y is in POSITIVE. 

This definition for POSITIVE has two problems. One, it does not generate all real num¬ 
bers (e.g., tt is not included because of its infinite length). Two, the definition is not re¬ 
cursive since we did not use known elements of POSITIVE to create new elements of 
POSITIVE; we used an element of INTEGERS and a string of digits instead. We could 
try: 

Rule 1 1 is in POSITIVE. 

Rule 2 If x and y are in POSITIVE, then so are x + y, x*y, and x/y. 

This does define some set, but it is not the set of positive real numbers (see Problem 17 at 
the end of this chapter). ■ 

Let us consider the way polynomials are usually defined: 

A polynomial is a finite sum of terms, each of which is of the form a real number 
times a power of x (that may be x° — 1). 

Now let us consider a recursive definition that is designed for people who know alge¬ 
braic notation, but do not know what a polynomial is: 

The set POLYNOMIAL is defined by these three rules: 

Rule 1 Any number is in POLYNOMIAL. 

Rule 2 The variable x is in POLYNOMIAL. 

Rule 3 If p and q are in POLYNOMIAL, then so are p + q, p - q, ip), and pq. 

The symbol pq, which looks like a concatenation of alphabet letters, in algebraic nota¬ 
tion refers to multiplication. 

Some sequence of applications of these rules can show that 3x 2 + 7x - 9 is in POLY¬ 
NOMIAL: 

By Rule 1 3 is in POLYNOMIAL. 

By Rule 2 x is in POLYNOMIAL. 
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By Rule 3 (3)(x) is in POLYNOMIAL; call it 3 jc. 

By Rule 3 (3x)(x) is in POLYNOMIAL; call it 3x 2 . 

By Rule 1 7 is in POLYNOMIAL. 

By Rule 3 (7)(x) is in POLYNOMIAL. 

By Rule 3 3 a 2 + lx is in POLYNOMIAL. 

By Rule 1 -9 is in POLYNOMIAL. 

By Rule 3 3x 2 + lx + (-9) = 3x 2 + lx - 9 is in POLYNOMIAL. 

In fact, there are several other sequences that could also produce this result. 

There are some advantages to this definition as well as the evident disadvantages. On 
the plus side, it is immediately obvious that the sum and product of polynomials are both 
themselves polynomials. This is a little more complicated to see if we had to provide a proof 
based on the classical definition. 

Suppose for a moment that we were studying calculus and we had just proven that the | 
derivative of the sum of two functions is the sum of the derivatives and that the derivative of 
the product fg is/'g +fg'. As soon as we prove that the derivative of a number is 0 and that 
the derivative of a is 1, we have automatically shown that we can differentiate all polynomi¬ 
als. This becomes a theorem that can be proven directly from the recursive definition. It is 
true that we do not then know that the derivative of x” is nx" ~ 1 , but we do know that it can 
be calculated for every n. 

In this way, we can prove that it is possible to differentiate all polynomials without giv¬ 
ing the best algorithm to do it. Since the topic of this book is computer theory, we are very 
interested in proving that certain tasks are possible for a computer to do even if we do not 
know the best algorithms by which to do them. It is for this reason that recursive definitions 
are important to us. 

Before proceeding to more serious matters, let us note that recursive definitions are not 
completely alien to us in the real world. What is the best definition of the set of people who 
are descended from Henry VIII? Is it not: 

Rule 1 The children of Henry VIII are all elements of DESCENDANTS. 

Rule 2 If x is an element of DESCENDANTS, then so are x’s children. 

Given a soldier, policeman, and mailman, it is sometimes not evident whether they are prop¬ 
erly termed members of the federal executive branch of government or some other type of 
public servant. This definition clears up the matter: 

Rule 1 The President is in EXECUTIVE-BRANCH-OF-GOVERNMENT. 

Rule 2 If x is in EXECUTIVE-BRANCH-OF-GOVERNMENT and y works for x, 
then y is in EXECUTIVE-BRANCH-OF-GOVERNMENT. 

Also, in mathematics we often see the following definition of factorial: 

Rule 1 0! = 1. 

Rule 2 «!=«•(« — 1)!. 

The reason that these definitions are called “recursive” is that one of the rules used to 
define the set mentions the set itself. We define EVEN in terms of previously known ele¬ 
ments of EVEN, POLYNOMIAL in terms of previously known elements of POLYNO¬ 
MIAL. We define (n + 1)! in terms of the value of n!. In computer languages, when we al¬ 
low a procedure to call itself, we refer to the program as recursive. These definitions have the 
same self-referential sense. 


An Important Language: Arithmetic Expressions 

example 

Observe how natural the following definitions are: 

Rule 1 JcisinLj. 

Rule 2 If w is any word in L v then xw is also in L y 


L ! = x + 


X XX XXX. . . 


Rule 1 A is in L 4 . 

Rule 2 If w is any word in L v then xw is also in L 4 . 

L 4 — x* — {A x xx xxx . . .} 

or 

Rule 1 x is in L r 

Rule 2 If w is any word in L v then xxw is also in L r 

L 2 = {x° dd } = {x xxx xxxxx. . .} 

or 

Rule \ 1 2 3 4 5 6 7 8 9 are in INTEGERS. 

Rule 2 If vv is any word in INTEGERS, then wO wl w2 w3 w4 
w5 w6 w7 w8 w9 are also words in INTEGERS. 

The definition of Kleene closure might have benefited from a recursive definition: 
Rule 1 If S is a language, then all the words of S are in S*. 

Rule 2 A is in S*. 

Rule 3 If x and y are in S*, then so is their concatenation xy. 


AN IMPORTANT LANGUAGE: ARITHMETIC EXPRESSIONS 

Suppose we ask ourselves what constitutes a valid arithmetic expression that can be typed on 
one line, in a form digestible by computers. The alphabet for this language is 


£={0 123456789 + 

Obviously, the following strings are not good: 


/ ( )} 


(3 + 5) + 6) 2(/8 + 9) (3 + (4 - )8) 2) - (4 

The first contains unbalanced parentheses; the second contains the forbidden substring (/ . 
The third contains the forbidden substring —). The fourth has a close parenthesis before the 
corresponding open parenthesis. Are there more rules? The subsequences // and */ are also 
forbidden. Are there still more? The most natural way of defining a valid arithmetic expres¬ 
sion, AE, is by using a recursive definition rather than a long list of forbidden substrings and 
parentheses requirements. The definition can be written as: 

Rule 1 Any number (positive, negative, or zero) is in AE. 

Rule 2 If x is in AE, then so are 

(i) (*) 

(ii) —x (provided x does not already start with a minus sign) 
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Rule 3 If x and y are in AE, then so are: 

(i) x + y (if the first symbol in y is not + or —) 

(ii) x y (if the first symbol in y is not + or —) 

(iii) x*y 

(iv) x/y 

(v) x**y (our notation for exponentiation) 

We have called this the “most natural” definition because, even though we may never 
have articulated this point, it truly is the method we use for recognizing arithmetic expres¬ 
sions in real life. If we are presented with 

(2 + 4) * (7 * (9 — 3)/4)/4 * (2 + 8) - 1 

and asked to determine whether it is a valid arithmetic expression, we do not really scan over 
the string looking for forbidden substrings or count the parentheses. We imagine it in our 
mind broken down into its components. (2 + 4) that is OK, (9 - 3) that is OK, 7 * (9 - 3)/4 ^ 
that is OK, and so on. We may never have seen a definition of “arithmetic expressions” be¬ 
fore, but this is what we have always intuitively meant by the phrase. 

This definition gives us the possibility of writing 2 + 3 + 4, which is not ambiguous. 
But it also gives us 8/4/2, which is. It could mean 8/(4/2) = 4 or (8/4)/2 - 1. Also, 3 + 4*5 
is ambiguous. So, we usually adopt conventions of operator hierarchy and left-to-right exe¬ 
cution. By applying Rule 2, we could always put in enough parentheses to avoid any confu¬ 
sion if we so desired. We return to this point in Part II, but for now this definition adequately 
defines the language of all valid strings of symbols for arithmetic expressions. Remember, 
the ambiguity in the string 8/4/2 is a problem of meaning. There is no doubt that the string is 
a word in AE, only doubt about what it means. 

This definition determines the set AE in a manner useful for proving many theorems 
about arithmetic expressions. 

THEOREM 2 

An arithmetic expression cannot contain the character $. 

PROOF 

This character is not part of any number, so it cannot be introduced into an AE by Rule 1. If 
the character string * does not contain the character $, then neither do the strings (*) and 
~(.v), so it cannot be introduced into an AE by Rule 2. If neither a nor y contains the charac¬ 
ter $, then neither do any of the expressions defined by Rule 3. Therefore, the character $ 
can never get into an AE. ■ 

THEOREM 3 

No AE can begin or end with the symbol /. 

PROOF 

No number begins or ends with this symbol, so it cannot occur by Rule 1. Any AE formed 
by Rule 2 must begin and end with parentheses or begin with a minus sign, so the / cannot 
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be introduced by Rule 2. If x does not already begin with a / and y does not end with a /, 
then any AE formed by any clause in Rule 3 will not begin or end with a /. Therefore, these 
rules will never introduce an expression beginning or ending with a /. ■ 

These proofs are like the story of the three chefs making a stew. One can add only meat 
to the pot. One can add only carrots to the pot. One can add only potatoes to the pot. Even 
without knowing exactly in what order the chefs visit the pot or how often, we still can con¬ 
clude that the pot cannot end up with an alarm clock in it. If no rule contributes a $, then one 
never gets put in even though if x had a $, then x + y would also. 

The symbol “/” has many names. In computer science, it is usually called a “slash”; 
other names are “oblique stroke,” “solidus,” and “virgule.” It also has another theorem. 

THEOREM 4 

No AE can contain the substring //. 

PROOF 

For variation, we shall prove this result by contradiction, even though a direct argument sim¬ 
ilar to those above could easily be given. 

Let us suppose that there were some AEs that contained the substring //. Let a shortest 
of these be a string called w. This means that w is a valid AE that contains the substring //, 
but there is no shorter word in AE that contains this substring. There may be more strings of 
the same length as w that contain //, but it does not matter which of these we begin with and 
choose to call w. 

Now we know that w, like all words in AE, is formed by some sequence of applications 
of Rules 1, 2, and 3. Our first question is: Which was the last rule used in the production of 
ve? This is easy to answer. We shall show that it must have been Rule 3(iv). If it were Rule 
3(iii), for instance, then the // must either be found in the x or y part. But x and y are pre¬ 
sumed to be in AE, so this would mean that there is some shorter word in AE than w that 
contains the substring //, which contradicts the assumption that w is the shortest. Similarly, 
we can eliminate all the other possibilities. Therefore, the last rule used to produce w must 
have been 3(iv). 

Now, since the // cannot have been contributed to w from the x part alone or from the y 
part alone (or else x or y are shorter words in AE with a double slash), it must have been in¬ 
cluded by finding an x part that ended in a / or a y part that began with a /. But since both x 
and v are AEs, our previous theorem says that neither case can happen. Therefore, even Rule 
3(iv) cannot introduce the substring //. 

Therefore, there is no possibility left for the last rule from which w can be constructed. 
Therefore, w cannot be in the set AE. Therefore, there is no shortest AE that contains the 
substring //. Therefore, nothing in the set AE can have the substring //. ■ 

This method of argument should sound familiar. It is similar to the proof that 
{xx jcq'}* contains all for n¥^ 1. 

The long-winded but careful proof of the last theorem is given to illustrate that recursive 
definitions can be conveniently employed in rigorous mathematical proofs. Admittedly, this 
was a trivial example of the application of this method. Most people would be just as con¬ 
vinced by the following “proof”: 
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How could an arithmetic expression contain the substring H? What would it mean? 

Huh? What are you, crazy or something? 

We should bear in mind that we are only on the threshold of investigating a very complex 
and profound subject and that in this early chapter we wish to introduce a feel for the tech¬ 
niques and viewpoints that will be relied on heavily later, under far less obvious circum¬ 
stances. We will use our learner’s permit to spend a few hours driving around an empty park¬ 
ing lot before venturing onto the highway. 

Another common use for recursive definitions is to determine what expressions are valid 
in symbolic logic. We shall be interested in one particular branch of symbolic logic called 
sentential calculus or propositional calculus. The version we shall define here uses only 
negation —■ and implication —* along with the phrase variables, although conjunction and 
disjunction could easily be added to the system. The valid expressions in this language are 
traditionally called WFFs for well-formed formulas. 

As with AE, parentheses are letters in the alphabet: 

i 

X - 1 —' —> ( ) a b c d. . .} 

There are other symbols sometimes used for negation, such as •—, —, and 
The rules for forming WFFs are: 

Rule 1 Any single Latin letter is a WFF, 

a b c d. . . 

Rule 2 If p is a WFF, then so are (p) and — 1 p . 

Rule 3 If p and q are WFFs, then so is p —* q. 

Some sequences of applications of these rules enable us to show that 

p-+((p~*p)^>q) 

is a WFF. Without too much difficulty, we can also show that 

P~* -*p (/?-> P ) P)-*P( 

are all not WFFs. 

As a final note in this section, we should be wary that we have sometimes used recursive 
definitions to define membership in a set, as in the phrase “x is in POLYNOMIAL” or “x is 
in EVEN,” and sometimes to define a property, as in the phrase “x is a WFF” or “x is even.” 
This should not present any problem. 

^ PROBLEMS 

. . 

1. Write another recursive definition for the language L, of Chapter 2. 

2. Using the second recursive definition of the set EVEN, how many different ways can we 
prove that 14 is in EVEN? 

3. Using the second recursive definition of EVEN, what is the smallest number of steps re¬ 
quired to prove that 100 is EVEN? Describe a good method for showing that 2 n is in 
EVEN. 

4. Show that the following is another recursive definition of the set EVEN: 

Rule 1 2 and 4 are in EVEN. 

Rule 2 If x is in EVEN, then so is x + 4. 


Problems 
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5. Show that there are infinitely many different recursive definitions for the set EVEN. 

6. Using any recursive definition of the set EVEN, show that all the numbers in it end in 
the digits 0, 2, 4, 6, or 8. 

7. The set POLYNOMIAL defined in this chapter contains only the polynomials in the one 
variable x. Write a recursive definition for the set of all polynomials in the two variables 
x and y. 

8. Define the set of valid algebraic expressions ALEX as follows: 

Rule 1 All polynomials are in ALEX. 

Rule 2 If/( a) and g(x) are in ALEX, then so are: 

(i) (fix)) 

(ii) -(/W) 

(iii) f(x) + g(x) 

(iv) f(x)-g(x) 

(v) /(x)g(x) 

(vi) /(x)/g(x) 

(vii) Axp*> 

(viii) f(g(x)) 

(a) Show that (a 4- 2) 3a is in ALEX. 

(b) Show that elementary calculus contains enough rules to prove the theorem that all 
algebraic expressions can be differentiated. 

(c) Is Rule 2 (viii) really necessary? 

9. Using the fact that 3.x 2 4- lx — 9 = (((((3)x) + 7)x) — 9), show how to produce this poly¬ 
nomial from the rules for POLYNOMIAL using multiplication only twice. What is the 
smallest number of steps needed for producing x 8 + x 4 ? What is the smallest number of 
steps needed for producing 7x 7 + 5.U + 3x 3 + x? 

10. Show that if n is less than 31, then x" can be shown to be in POLYNOMIAL in fewer 
than eight steps. 

11. In this chapter, we mentioned several substrings of length 2 that cannot occur in arith¬ 
metic expressions, such as (/, + ), //, and */. What is the complete list of substrings of 
length 2 that cannot occur? 

12. Are there any substrings of length 3 that cannot occur that do not contain forbidden sub¬ 
strings of length 2? (This means that /// is already known to be illegal because it con¬ 
tains the forbidden substring //.) What is the longest forbidden substring that does not 
contain a shorter forbidden substring? 

13. The rules given earlier for the set AE allow for the peculiar expressions 

(((((9))))) and -(-(-(-(9)))) 

It is not really harmful to allow these in AE, but is there some modified definition of AE 
that eliminates this problem? 

14. (i) Write out the full recursive definition for the propositional calculus that contains the 

symbols V and A as well as —' and — *. 

(ii) What are all the forbidden substrings of length 2 in this language? 

15. (i) When asked to give a recursive definition for the language PALINDROME over the 

alphabet X = {a b }, a student wrote: 
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16. 

17. 

18. 

19. 

20 . 


Rule 1 a and h are in PALINDROME. 

Rule 2 If a is in PALINDROME, then so are axa and bxb. 

Unfortunately, all the words in the language defined above have an odd length and 
so it is not all of PALINDROME. Fix this problem. 

(ii) Give a recursive definition for the language EVENPALINDROME of all palin¬ 
dromes of even length. 

(i) Give a recursive definition for the set ODD = {1 3 5 7 . . .}. 

(ii) Give a recursive definition for the set of strings of digits 0, 1, 2, 3, . . . 9 that can¬ 
not start with the digit 0. 

In this chapter, we attempted to define the positive numbers by the following rules: 

Rule 1 1 is in L. 

Rule 2 If jc and y are in L, then so are x + y, x*y, and x/y. 

The language L defined in this way is a famous mathematical set. What is it? Prove it. ^ 

Give two recursive definitions for the set 

POWERS-OF-TWO = {1 2 4 8 16...} 

Use one of them to prove that the product of two POWERS-OF-TWO is also a 
POWER-OF-TWO. 

Give recursive definitions for the following languages over the alphabet \a b)\ 

(i) The language EVENSTRING of all words of even length. 

(ii) The language ODDSTRING of all words of odd length. 

(iii) The language AA of all words containing the substring aa. 

(iv) The language NOTAA of all words not containing the substring aa. 

(i) Consider the following recursive definition of 3-PERMUTATION: 

Rule 1 123 is a 3-PERMUTATION. 

Rule 2 If xyz is a 3-PERMUTATION, then so are zyx and yzx. 

Show that there are six different 3-PERMUTATIONs. 

(ii) Consider the following recursive definition of 4-PERMUTATION: 

Rule 1 1234 is a 4-PERMUTATION. 

Rule 2 If xyzw is a 4-PERMUTATION, then so are wzyx and yzwx. 

How many 4-PERMUTATIONs are there (by this defihition)? 
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M DEFINING LANGUAGES BY ANOTHER NEW METHOD 

We wish now to be very careful about the phrases we use to define languages. We defined L, 
in Chapter 2 by the symbols: 

L, = {a" for n- 1 2 3 . . .} 

and we presumed that we all understood exactly which values n could take. We might even 
have defined the language L 2 by the symbols: 

L 2 = {x" for n= 1 3 5 7 . . .} 

and again we could presume that we all agree on what words are in this language. 

We might define a language by the symbols: 

L 5 = {j t" for n = 1 4 9 16 . . .} 

but now the symbols are becoming more of an IQ test than a clear definition. 

What words are in the language 

L 6 ={x” for n = 3 4 8 .22". . .} ? 

Perhaps these are the ages of the sisters of Louis XIV when he assumed the throne of 
France. More precision and less guesswork are required, especially where computers are 
concerned. In this chapter, we shall develop some new language-defining symbolism that 
will be much more precise than the ellipsis. 

Let us reconsider the language L 4 of Chapter 2: 


L,= {A 


x xx xxx xxxx 



In that chapter, we presented one method for indicating this set as the closure of a smaller 
set. 

Let S = { jc }. Then L 4 = S*. 


As shorthand for this, we could have written 


L 4 =[x}* 
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Rule 1 a and b are in PALINDROME. 

Rule 2 If x is in PALINDROME, then so are axa and bxb. 

Unfortunately, all the words in the language defined above have an odd length and 
so it is not all of PALINDROME. Fix this problem. 

(ii) Give a recursive definition for the language EVENPALINDROME of all palin¬ 
dromes of even length. 

16. (i) Give a recursive definition for the set ODD = {1 3 5 7 . . .}. 

(ii) Give a recursive definition for the set of strings of digits 0, 1, 2, 3, . . . 9 that can¬ 
not start with the digit 0. 

17. In this chapter, we attempted to define the positive numbers by the following rules: 

Rule 1 1 is in L. 

Rule 2 If a and y are in L, then so are x + y, x*y, and x/y. 

The language L defined in this way is a famous mathematical set. What is it? Prove it. 

18. Give two recursive definitions for the set 

POWERS-OF-TWO = {1 2 4 8 16...} 

Use one of them to prove that the product of two POWERS-OF-TWO is also a 
POWER-OF-TWO. 

19. Give recursive definitions for the following languages over the alphabet \a b \: 

(i) The language EVENSTRING of all words of even length. 

(ii) The language ODDSTRING of all words of odd length. 

(iii) The language AA of all words containing the substring aa. 

(iv) The language NOTAA of all words not containing the substring aa. 

20. (i) Consider the following recursive definition of 3-PERMUTATION: 

Rule 1 123 is a 3-PERMUTATION. 

Rule 2 If xyz is a 3-PERMUTATION, then so are zyx and yzx. 

Show that there are six different 3-PERMUTATIONs. 

(ii) Consider the following recursive definition of 4-PERMUTATION: 

Rule 1 1234 is a 4-PERMUTATION. 

Rule 2 If xyzw is a 4-PERMUTATION, then so are wzyx and yzwx. 

How many 4-PERMUTATIONs are there (by this defihition)? 
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^FINING LANGUAGES BY ANOTHER NEW METHOD 

We wish now to be very careful about the phrases we use to define languages. We defined L l 
in Chapter 2 by the symbols: 

L, = {x" for n = 1 2 3 . . .} 

i#d we presumed that we all understood exactly which values n could take. We might even 
have defined the language L 2 by the symbols: 

L 2 — (x” for n = 1 3 5 7 . . .} 

and again we could presume that we all agree on what words are in this language. 

We might define a language by the symbols: 

L 5 ={x n for n = 1 4 9 16...} 

but now the symbols are becoming more of an IQ test than a clear definition. 

What words are in the language 

L 6 = {x" for n = 3 4 8 22 . . .} ? 

Perhaps these are the ages of the sisters of Louis XIV when he assumed the throne of 
France. More precision and less guesswork are required, especially where computers are 
concerned. In this chapter, we shall develop some new language-defining symbolism that 
will be much more precise than the ellipsis. 

Let us reconsider the language L 4 of Chapter 2: 

L 4 = {A x xx xxv xxxx . . .} 

In that chapter, we presented one method for indicating this set as the closure of a smaller 
set. 

Let S — {x}. Then L 4 = 5*. 

As shorthand for this, we could have written 
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We now introduce the use of the Kleene star applied not to a set, but directly to the letter x 
and written as a superscript as if it were an exponent: 


The simple expression x* will be used to indicate some sequence of x’s (maybe none at 
all). This .v is intentionally written in boldface type to distinguish it from an alphabet charac¬ 
ter. 

x* = A or x or x 2 or x 3 or x 4 . . . 

= x" for some n = 0 1 2 3 4 . . . 

We can think of the star as an unknown power or undetermined power. That is, x* stands for 
a string of x’s, but we do not specify how many. It stands for any string of x’s in the language 

^ 4 ' 

The star operator applied to a letter is analogous to the star operator applied to a set. It 
represents an arbitrary concatenation of copies of that letter (maybe none at all). This nota¬ 
tion can be used to help us define languages by writing 

L 4 = language^*) 

Since x* is any string of x’s, L 4 is then the set of all possible strings of x’s of any length (in¬ 
cluding A). 

We should not confuse x*, which is a language-defining symbol, with L 4 , which is the 
name we have given to a certain language. This is why we use the word “language” in the 
equation. We shall soon give a name to the world in which this symbol x* lives, but not quite 
yet. Suppose that we wished to describe the language L over the alphabet £ = {a b\, 
where 

L = {a ab abb abhb abbbb . . .) 

We could summarize this language by the English phrase “all words of the form one a fol¬ 
lowed by some number of b's (maybe no h' s at all).” 

Using our star notation and boldface letters, we may write 


or without the space 


L = language(a b*) 


L = language(ab*) 


The meaning is clear: This is a language in which the words are the concatenation of an ini¬ 
tial a with some or no b 's (i.e., b*). 

Whether we put a space inside ab* or not is only for the clarity of reading; it does not 
change the set of strings this represents. No string can contain a blank unless a blank is a 
character in the alphabet 2. If we want blanks to be in the alphabet, we normally introduce 
some special symbol to stand for them, as blanks themselves are invisible to the naked eye. 
The reason for putting a blank between a and b* in the product above is to emphasize the 
point that the star operator is applied to the b only. We have now used a boldface letter with¬ 
out a star as well as with a star. 

We can apply the Kleene star to the whole string ab if we want, as follows: 

(ab)* = A or ab or abab or ababab . . . 

Parentheses are not letters in the alphabet of this language, so they can be used to indi¬ 
cate factoring without accidentally changing the words. Since the star represents some kind 
of exponentiation, we use it as powers are used in algebra, where by universal understanding 
the expression xy 2 means x(y 2 ), not (xy) 2 . 
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If we want to define the language L, this way, we may write 

L x = language(xx*) 

This means that we start each word of L, by writing down an x and then we follow it with 
some string of x’s (which may be no more x’s at all). Or we may use the f notation from 
Chapter 2 and write 

L, = language^) 

meaning all words of the form x to some positive power (i.e., not x° = A). The + notation is 
a convenience, but is not essential since we can say the same thing with *’s alone. 

EXAMPLE 

The language L, can be defined by any of the expressions below: 

XX* x -1 xx*x* x*xx* x’x* x*x‘ x*x*x*xx* 

Remember, x* can always be A. ■ 

EXAMPLE 

The language defined by the expression 

ab*a 

is the set of all strings of a' s and b' s that have at least two letters, that begin and end with 
a\ and that have nothing but b's inside (if anything at all). 

Language(ab*a) = [aa aba abba abbba abbbba . . .} 

It would be a subtle mistake to say only that this language is the set of all words that begin 
and end with an a and have only b's in between, because this description may also apply to 
the word a, depending on how it is interpreted. Our symbolism eliminates this ambiguity. 

■ 

EXAMPLE 

The language of the expression 

a*b* 

contains all the strings of a's and b's in which all the a's (if any) come before all the b's (if any). 

Language(a*b*) = {A a b aa ab bb aaa aab abb bbb aaaa . . .} 

Notice that ba and aba are not in this language. Notice also that there need not be the same 
number of a’s and b's . ■ 

Here we should again be very careful to observe that 

a*b* ¥=■ (ab)* 

since the language defined by the expression on the right contains the word abab , whereas 
the language defined by the expression on the left does not. This cautions us against thinking 
of the * as a normal algebraic exponent. 
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The language defined by the expression a*b*a* contains the word baa since it starts 
with zero a's followed by one b followed by two a’s. 


EXAMPLE 


The following expressions both define the language L 2 = {a 01 


x(xx)* or (xx)*x 


but the expression 


does not since it includes the word (xr) x (a). ■ 

We now introduce another use for the plus sign. By the expression x + y where .v and y 
are strings of characters from an alphabet, we mean “either x or y.” This means that x + y of¬ 
fers a choice, much the same way that x* does. Care should be taken so as not to confuse 
this with + as an exponent. 


EXAMPLE 

Consider the language T defined over the alphabet % — {a b c}: 

T = [a c ab cb abb ebb abbb ebbb abbbb cbbbb . . .} 

All the words in T begin with an a or a c and then are followed by some number of /?’s. Sym¬ 
bolically, we may write this as 

T = language((a + c)b*) 

= language(either a or c then some b’ s) 

We should, of course, have said “some or no b' s.” We often drop the zero option because it is 
tiresome. We let the word “some” always mean “some or no,” and when we mean “some 
positive number of,” we say that. 

We say that the expression (a + c)b* defines a language in the following sense. For each 
* or +, used as a superscript, we must select some number of factors for which it stands. For 
each other + , we must decide whether to choose the right-side expression or the left-side ex¬ 
pression. For every set of choices, we have generated a particular string. The set of all strings 
that can be produced by this method is the language of the expression. In the example 

(a + c)b* 

we must choose either a or c for the first letter and then we choose how many b 's the b* 
stands for. Each set of choices is a word. If from (a + c) we choose c and we choose b* to 
mean bbb, we have the word ebbb. ■ 


l 


EXAMPLE 


Now let us consider a finite language L that contains all the strings of a 's and b 's of length 
three exactly: 

L = {aaa aab aba abb baa bab bba bbb } 


I 




Formal Definition of Regular Expressions 


The first letter of each word in L is either an a or a b. The second letter of each word in L is 
either an a or a b. The third letter of each word in L is either an a or a b. So, we may write 


or for short, 


L = language((a + b)(a + b)(a + b)) 


L = language((a + b) 3 ) 


If we want to define the set of all seven-letter strings of «’s and b' s, we could write 
(a + b) 7 . In general, if we want to refer to the set of all possible strings of a’s and b’s of any 
length whatsoever, we could write 

(a + b)* 

This is the set of all possible strings of letters from the alphabet % = [a b\ including the 
null string. This is a very important expression and we shall use it often. 

Again, this expression represents a language. If we choose that * stands for 5, then 

(a + b)* 


(a + b) 5 = (a + b)(a + b)(a + b)(a + b)(a + b) 

We now have to make five more choices: either a or b for the first letter, either a or b for the 
second letter, and so on. 

This is a very powerful notation. We can describe all words that begin with the letter a 
simply as 

a(a + b)* 

that is, first an a, then anything (as many choices as we want of either letter a or b). 

All words that begin with an a and end with a b can be defined by the expression 

a(a + b)*b = ^(arbitrary string)^ 


FORMAL DEFINITION OF REGULAR EXPRESSIONS 

After all the introduction we have endured of the slow evolution of these language-defining 
expressions, it is time for us to identify them with their proper name and give them a math¬ 
ematical definition. As is no surprise to those who have read the title of this chapter, these 
are called regular expressions. Similarly, the corresponding languages that they define are 
referred to as regular languages. We shall soon see that this language-defining tool is of 
limited capacity in that there are many interesting languages that cannot be defined by regu¬ 
lar expressions, which is why this volume has more than 100 pages. A regular language is 
one that can be defined by a regular expression even though it may also have many other fine 
definitions. A regular expression, on the other hand, must take a very rigorous form as de¬ 
fined below recursively. 


DEFINITION 

The symbols that appear in regular expressions are the letters of the alphabet % the symbol 
for the null string A, parentheses, the star operator, and the plus sign. 
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The set of regular expressions is defined by the following rules: 

Rule 1 Every letter of X can be made into a regular expression by writing it in bold¬ 
face; A itself is a regular expression. 

Rule 2 If r, and r 2 are regular expressions, then so are: 

G) (r,) 

(ii) r,r 2 

(iii) r, + r 2 

(iv) r,* 

Rule 3 Nothing else is a regular expression. ■ 

We could have included the plus sign as a superscript in r { + as part of the definition, but 
since we know that r, + = r ( r,*, this would add nothing valuable. 

This is a language of Janguage-definers. It is analogous to a book that lists all the books 
in print. Every word in such a book is a book-definer. The same confusion occurs in everyday 
speech. The string “French” is both a word (an adjective) and a language-defining name (a 
noun). However difficult computer theory may seem, common English usage is much harder. 

Because of Rule 1, we may have trouble in distinguishing when we write an a whether 
we mean a, the letter in X; a, the word in X*; {a}, the one-word language; or a, the regular 
expression for that language. Context and typography will guide us. 

As with the recursive definition of arithmetic expressions, we have included the use of 
parentheses as an option, not a requirement. Let us emphasize again the implicit parentheses 
in r,*. If r, = aa + b, then the expression r,* technically refers to the expression 

r,* = aa + b* 

which is the formal concatenation of the symbols for r, with the symbol *, but what we gen¬ 
erally mean when we write r,* is actually (r,)*: 

r* - (r,)* = (aa + b)* 

which is different. Both are regular expressions and both can be generated from the rules, 
but their languages are quite different. Care should always be taken to produce the expres¬ 
sion we actually want, but this much care is too much to ask of mortals, and when we write 
r,* in the rest of the book, we really mean (r,)*. 

The definition we have given for regular expressions contains one subtle but important 
omission: the language tj>. This language is not the same as the one represented by the regu¬ 
lar expression A, or by any other regular expression that comes from our definition. We al¬ 
ready have a symbol for the word with no letters and a symbol for the language with no 
words. Do we really need to invent yet another symbol for the regular expression that defines 
the language with no words? Would it simply be the regular expression with no characters, 
analogous to the word lambda (A) in the language of regular expressions? To the purely log¬ 
ical Vulcan mind, that would be the only answer, but since we have already employed the 
boldface lambda (A) to mean the regular expression defining the word lambda, we take the 
liberty of using the boldface phi (<j>) to be the regular expression for the null language. We 
have already wasted enough thought on the various degrees of nothingness to qualify as me¬ 
dieval ecclesiastics; the desire for more precision would require psycho-active medication. 
For any r, we have 

r + tj> = r 
and 


4>r = t|> 
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but what is far less clear is exactly what <|>* should mean. We shall avoid this philosophical 
crisis by never using this symbolism and avoiding those who do. 


EXAMPLE 

Let us consider the language defined by the expression 

(a + b)* a (a + b)* 

At the beginning, we have (a + b)*, which stands for anything, that is, any string of a's and 
^’s, then comes an a, then another anything. All told, the language is the set of all words 
over the alphabet X = {a b } that have an a in them somewhere. The only words left out are 
those that have only b’s and the word A. 

For example, the word abbaab can be considered to be derived from this expression by 
three different sets of choices: 

(A )a(bbaab) or (abb)a(ab) or {abba)a{b) 

If the only words left out of the language defined by the expression above are the words 
without a' s (A and strings of h' s), then these omitted words are exactly the language defined 
by the expression b*. If we combine these two, we should produce the language of all 
strings. In other words, since 

all strings = (all strings with an a) + (all strings without an a) 
it should make sense to write 

(a + b)* = (a + b)*a(a + b)* + b* 

Here, we have added two language-defining expressions to produce an expression that de¬ 
fines the union of the two languages defined by the individual expressions. We have done 
this with languages as sets before, but now we are doing it with these emerging language¬ 
defining expressions. 

We should note that this use of the plus sign is consistent with the principle that in these 
expressions plus means choice. When we add sets to form a union, we are saying first 
choose the left set or the right set and then find a word in that set. In the expression above, 
first choose (a + b)*a(a + b)* or b* and then make further choices for the pluses and stars 
and finally arrive at a word that is included in the total language defined by the expression. 
In this way, we see that the use of plus for union is actually a natural equivalence of the use 
of plus for choice. 

Notice that this use of the plus sign is far from the normal meaning of addition in the al¬ 
gebraic sense, as we can see from 

a * = a * + a * 
a* = a* + a* + a* 

a* = a* + aaa 

For plus as union or plus as choice, these all make sense; for plus as algebra, they lead to 
presumptions of subtractions that are misguided. ■ 
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EXAMPLE 

The language of all words that have at least two a’s can be described by the expression 

(a + b)*a(a + b)*a(a + b)* 

= (some beginning)(the first important a)(some middle)(the second 
important a)(some end) 

where the arbitrary parts can have as many a’s (or b’s) as they want. ■ 

EXAMPLE 

Another expression that denotes all the words with at least two a 's is 

b*ab*a(a + b)* 

We scan through some jungle of b ’s (or no b’ s) until we find the first a, then more b 's (or no 
b’s), then the second a , then we finish up with anything. In this set are abbbabb and aaaaa. 
We can write 

(a + b)*a(a + b)*a(a + b)* = b*ab*a(a + b)* 

where by the equal sign we do not mean that these expressions are equal algebraically in the 
same way as 

x + x = lx 

but that they are equal because they describe the same item, as with 


16th President = Abraham Lincoln 


We could write 


language((a + b)*a(a + b)*a(a + b)*) 

= language(b*ab*a(a + b)*) 

= all words with at least two a’s 

To be careful about this point, we say that two expressions are equivalent if they describe 
the same language. 

The expressions below also describe the language of words with at least two a’s: 

(a + b)*ab*ab* 

t t 

next-to- last a 
last a 


b*a(a + b)*ab* 

t T 

first a last a 


EXAMPLE 


If we wanted all the words with exactly two a’s, we could use the expression 

b*ab*ab* 
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which describes such words as aab, baba , and bbbabbbab. To make the word aab, we let the 
first and second b* become A and the last becomes b. M 


EXAMPLE 

The language of all words that have at least one a and at least one b is somewhat trickier. If 
we write 

(a + b)*a(a + b)*b(a + b)* 

= (arbitrary) a(arbitrary) ^(arbitrary) 

we are then requiring that an a precede a b in the word. Such words as ba and bbaaaa are 
not included in this set. Since, however, we know that either the a comes before the b or the 
b comes before the a, we could define this set by the expression 

(a + b)*a(a + b)*b(a + b)* + (a + b)*b(a + b)*a(a + b)* 

Here, we are still using the plus sign in the general sense of disjunction (or). We are taking 
the union of two sets, but it is more correct to think of this + as offering alternatives in 
forming words. 

There is a simpler expression that defines the same language. If we are confident that 
the only words that are omitted by the first term 

(a + b)*a(a + b)*b(a + b)* 

are the words of the form some b's followed by some a’s, then it would be sufficient to add 
these specific exceptions into the set. These exceptions are all defined by the regular expres- 




bb*aa* 

The language of all words over the alphabet X = {a b) that contain both an a and a b 
is therefore also defined by the expression 

(a + b)*a(a + b)*b(a + b)* + bb*aa* 

Notice that it is necessary to write bb*aa* because b*a* will admit words we do not want, 
such as aaa. 

We have shown that 

(a + b)*a(a + b)*b(a + b)* + (a + b)*b(a + b)*a(a + b)* = (a + b)*a(a + b)*b(a + b)* + bb*aa* 


EXAMPLE 


„2 


'll 


The only words that do not contain both an a and a b in them somewhere are the words of all 
a’s, all b’s, or A. When these are included, we get everything. Therefore, the regular expression 

(a + b)*a(a + b)*b(a + b)* + bb*aa* + a* + b* 

defines all possible strings of a’s and b’s. The word A is included in both a* and b*. 

We can then write 

(a + b)* = (a + b)*a(a + b)*b(a + b)* + bb*aa* + a* + b* . 

which is not a very obvious equivalence at all. ■ 
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We must not misinterpret the fact that every regular expression defines some language 
to mean that the associated language has a simple English description, such as in the preced¬ 
ing examples. It may very well be that the regular expression itself is the simplest descrip¬ 
tion of the particular language. For example, 

(A + ba*)(ab*a + ba*)*b(a* + b*a)bab* 

probably has no cute concise alternate characterization. And even if it does reduce to some¬ 
thing simple, there is no way of knowing this. That is, there is no algorithm to discover hid¬ 
den meaning. 


EXAMPLE 

All temptation to treat these language-defining expressions as if they were algebraic polyno¬ 
mials should be dispelled by these equivalences: 

(a + b)* - (a + b)* + (a + b)* 

(a + b)* - (a + b)* + a* 

(a + b)* = (a + b)*(a + b)* 

(a + b)* = a(a + b)* + b(a + b)* + A 
(a + b)* = (a + b)*ab(a + b)* + b*a* 

The last of these equivalences requires some explanation. It means that all the words 
that do not contain the substring ab (which are accounted for in the first term) are all a’ s, all 
b' s, A, or some ft’s followed by some a' s. All four missing types are covered by b*a*. ■ 

Usually, when we employ the star operator, we are defining an infinite language. We can 
represent a finite language by using the plus sign (union sign) alone. It the language L over 
the alphabet X = {a b } contains only the finite list of words 

L — { abba baaa bbbb} 

then we can represent L by the symbolic expression 

L = language(abba + baaa + bbbb) 

Every word in L is some choice of options ot this expression. 

If L is a finite language that includes the null word A, then the expression that defines L 
must also employ the symbol A. 

For example, if 

L = {A a aa bbb} 
then the symbolic expression for L must be 

L = language(A + a + aa + bbb) 

The symbol A is a very useful addition to our system of language-defining symbolic ex¬ 
pressions. 

EXAMPLE 

Let V be the language of all strings of a’s and b’ s in which either the strings are all b s or 
else there is an a followed by some b’ s. Let V also contain the word A: 

V ~ {A a b ab bb abb bbb abbb bbbb . . .} 
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We can define V by the expression 

b* + ab* 

where the word A is included in the term b*. Alternatively, we could define V by the expression 

(A + a)b* 

This would mean that in front of the string of some b' s, we have the option of either adding 
an a or nothing. Since we could always write b* = Ab*, we have what appears to be some 
sort of distributive law at work: 

Ab* + ab* = (A + a)b* 

We have factored out the b* just as in algebra. It is because of this analogy to algebra that 
we have denoted our disjunction by the plus sign instead of the union sign U or the symbolic 
logic sign V. Sometimes, we like it to look algebraic; sometimes, we do not. ■ 

We have a hybrid system: The * is somewhat like an exponent and the + is somewhat 
like addition. But the analogies to algebra should be approached very suspiciously, since 
addition in algebra never means choice and algebraic multiplication has properties dif¬ 
ferent from concatenation (even though we sometimes conventionally refer to it as a 
product): 

ab = ba in algebra, they are the same numerical product 

ab ¥= ba in formal languages, they are different words 

Let us reconsider the language 


T — {a c ab cb abb ebb . . .} 


T can be defined as above by 


but it can also be defined by 


(a + c)b* 


ab* + cb* 


This is another example of the distributive law. 

However, the distributive law must be used with extreme caution. Sometimes, it is 
difficult to determine whether if the law is applicable. Expressions may be distributed but 
operators cdfinot. Certainly, the star alone cannot always be distributed without changing 
the meaning of the expression. For example, as we have noted earlier, (ab)* a*b*. The 

language associated with (ab)* is words with alternating a 's and b’s, whereas the lan¬ 
guage associated with a*b* is only strings where all the a’s (if any) precede all the b’s 
(also if any). 

To make the identification between the regular expressions and their associated lan¬ 
guages more explicit, we need to define the operation of multiplication of sets of words, a 
concept we have used informally already. 


DEFINITION 

If .V and T are sets of strings of letters (whether they are finite or infinite sets), we define the 
product set of strings of letters to be. 

ST ~ {all combinations of a string from S concatenated with a string from T in that order} 
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EXAMPLE 

If 

then 


S — {a aa aaa }, T — {bb bbb } 


ST= {abb abbb aabb aabbb aaabb aaabbb] 
Note that these words are not in proper lexicographic order. 


EXAMPLE 

If 


then 


5 = [a bb 

ST - [aa aab 


bab }, T={a ab\ 

bba bbab baba babab) 


EXAMPLE 


If 


P = {a bb bab }, Q = {A bbbb } 


then 


PQ = {a bb bab abbbb bbbbbb babbbbb] 


EXAMPLE 

If L is any language, then 

L A= \L=L 


EXAMPLE 

If 


M={ A * xx), iV={A y yy yyy yyyy . . .} 

then 

MN = {A y yy yyy yyyy . . . 

x xy xyy xyyy xyyyy . . . 
xx xxy xxyy xxyyy xxyyyy . . .} 

Using regular expressions, we can write these five examples as 

(a + aa + aaa)(bb + bbb) = abb + abbb + aabb + aabbb + aaabb + aaabbb 
(a + bb + bab)(a + ab) = aa + aab + bba + bbab + baba + babab 
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! ; (a + bb + bab) (A + bbbb) = a + bb + bab + ab 4 + b 6 + bab 5 

gjpfe*: rA = Ar = r 

. (A + x + xx)(y*) = y*+ xy* + xxy* 

example 

If FRENCH and GERMAN are their usual languages, then the product FRENCHGERMAN 
£2 i s the language of all strings that start with a FRENCH word and finish with a GERMAN 

jPy -; word. Some words in this language are ennuiverboten and souffldGesundheit. ■ 

J It might not be clear why we cannot just leave the rules for associating a language with 

jpg a regular expression on the informal level, with the informal instruction “make choices for + 
and *.” The reason is that the informal phrase “make choices” is much harder to explain pre- 
v g y cisely than the formal mathematical presentation below. 

LANGUAGES ASSOCIATED WITH REGULAR EXPRESSIONS 

We are now ready to give the rules for associating a language with every regular expression. 
As we might suspect, the method for doing this is given recursively. 

DEFINITION 

jpfM: The following rules define the language associated with any regular expression: 

Rule 1 The language associated with the regular expression that is just a single letter 
is that one-letter word alone and the language associated with A is just {A}, a 
one-word language. 

gg Rule 2 If r, is a regular expression associated with the language L, and r 2 is a regular 

expression associated with the language L 2 , then: 

(i) The regular expression (r t ) (r 2 ) is associated with the product L 2 that is 
the language L, times L 2 : 

gift language^, r 2 ) = L,L 2 

(ii) The regular expression r, + r 2 is associated with the language formed by 
the union of the sets L x and L 2 : 

language^, + r 2 ) = L x + L 2 

(iii) The language associated with the regular expression (r t )* is L,*, the 

ftg Kleene closure of the set L, as a set of words: 

fjgjg language(r,*) = L* ■ 

Once again, this collection of rules proves recursively that there is some language asso¬ 
ciated with every regular expression. As we build up a regular expression from the rules, we 
simultaneously are building up the corresponding language. 

The rules seem to show us how we can interpret the regular expression as a language, 
but they do not really tell us how to understand the language. By this we mean that if we ap- 
jjlp ply the rules above to the regular expression 


(a + b)*a(a + b)*b(a + b)* + bb*aa* 
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we can develop a description of some language, but can we understand that this is the lan¬ 
guage of all strings that have both an a and a b in them? This is a question of meaning. 

This correspondence between regular expressions and languages leaves open two 
other questions. We have already seen examples where completely different regular ex¬ 
pressions end up describing the same language. Is there some way of telling when this 
happens? By “way” we mean, of course, an algorithm. We shall present an algorithmic 
procedure in Chapter 11 to determine whether or not two regular expressions define the 
same language. 

Another fundamental question is this: We have seen that every regular expression is as¬ 
sociated with some language; is it also true that every language can be described by a regular 
expression? In our next theorem, we show that every finite language can be defined by a reg¬ 
ular expression. The situation for languages with infinitely many words is different. We shall 
prove in Chapter 10 that there are some languages that cannot be defined by any regular ex¬ 
pression. 

As to the first and perhaps most important question, the question of understand¬ 
ing regular expressions, we have not a clue. Before we can construct an algorithm for 
obtaining understanding, we must have some good definition of what it means to 
understand. We may be centuries away from being able to do that, if it can be done 
at all. 


FINITE LANGUAGES ARE REGULAR 
THEOREM 5 

If L is a finite language (a language with only finitely many words), then L can be defined by 
a regular expression. In other words, all finite languages are regular. 


PROOF 

To make one regular expression that defines the language L, turn all the words in L into bold¬ 
face type and insert plus signs between them. Voila. 

For example, the regular expression that defines the language 

L = {baa abbba bababa } 


baa + abbba + bababa 


L = {aa ab ba bb) 
the algorithm described above gives the regular expression 


aa + ab + ba + bb 

Another regular expression that defines this language is 



How Hard It Is to Understand a Regular Expression 


so the regular expression need not be unique, but so what. We need only show that at least 
one regular expression exists. 

The reason this trick only works for finite languages is that an infinite language would 
become a regular expression that is infinitely long, which is forbidden. W 


EXAMPLE 


L — { A x xx xxx xxxx xxxxx) 

The regular expression we get from the theorem is 

A + X + XX + xxx + xxxx + xxxxx 
A more elegant regular expression for this language is 


Of course, the 5 is, strictly speaking, not a legal symbol for a regular expression although we 
all understand it means 


HOW HARD IT IS TO UNDERSTAND 
A REGULAR EXPRESSION 


Let us examine some regular expressions and see if we are lucky enough to understand 
something about the languages they represent. 


EXAMPLE 


Consider the expression 

(a + b)*(aa + bb)(a + b)* 

This is the set of strings of a’ s and /?’s that at some point contain a double letter. We can 
think of it as 


(arbitrary)(double letter) (arbitrary) 

Let us now ask, “What strings do not contain a double letter?” Some examples are 
A a b ab ba aba bah abab baba .... The expression (ab)* covers all of these 
except those that begin with b or end in a . Adding these choices gives us the regular expression 

(A + b)(ab)*(A + a) 

Combining these two gives 

(a + b)*(aa + bb)(a 4 - b)* + (A + b)(ab)*(A + a) 

Who among us is so boldfaced as to claim that seeing the expression above they could tell 
immediately that it defines all strings? gg 






i 
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However, 

(aa + ab*)* (aa + ab)* 

since the language for the expression on the left includes the word abbabb, whereas the lan¬ 
guage on the right does not. (The language defined by the regular expression on the right 
cannot contain any word with a double b .) 

If one had not just seen this explained, would it be obvious? 

EXAMPLE 

Consider the regular expression 


The equation above casts a major doubt on the possibility of finding a set of algebraic 
rules to reduce one regular expression to another equivalent one. Yet, it is still unknown 
whether this can be done. 


EXAMPLE 

Consider the language defined by the regular expression 


This is the language of all words without a double a. The typical word here starts with some 
b’s. Then come repeated factors of the form abb* (an a followed by at least one b). Then we 
finish up with a final a or we leave the last b' s as they are. This is another starred expression 
with a star inside. ■ 

If we are simply interested in being devilish and creating a mess, we can do so recur¬ 
sively. Let us start with the observation that all strings either have a double a or isolated a’s 
as in the example above: 

(a + b)* = (a + b)*aa(a + b)* + b*(abb*)*(A +a) 

Now, let us use (a*b*)* instead of the first (a + b)*: 

(a + b)* = (a*b*)*aa(a + b)* + b*(abb*)*(A + a) 

Now, once we note that the entire right-hand side is equivalent to (a + b)*, we can use it (the 
whole expression) to substitute for the subexpression (a + b)* on the right. This gives 

(a + b)* = (a*b*)*aa[(a*b*)*aa(a + b)* + b*(abb*)*(A + a)j + b*(abb*)*(A + a) 

There is still a substring (a + b)* on the right-hand side and we can again recursively re¬ 
place it by the whole expression above. And so on, ad nauseam. The sole application of cre¬ 
ating needlessly complicated expressions equivalent to much simpler ones is to make the in¬ 
structor’s job in grading homework exponentially more difficult. 
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EXAMPLE 


Consider the regular expression below: 


E = (a + b)* a (a + b)* (a + A) (a + b)* a (a + b)* 
= (arbitrary) a (arbitrary) (a or nothing) (arbitrary) a (arbitrary) 


One obvious fact is that all the words in the language of E must have at least two a 's in 
them. Let us break up the middle plus sign into its two cases: Either the middle factor con¬ 
tributes an a or else it contributes a A. Therefore, 


E = (a + b)*a(a + b)*a(a + b)*a(a + b)* 

+ (a + b)*a(a + b)*A(a + b)*a(a + b)* 

This is a more detailed use of the distributive law. The first term above clearly represents all 
words that have at least three as in them. Before we analyze the second term, let us make 
the observation that 


(a + b)* 


This would reduce the second term of the expression to 


(a + b)*a(a + b)*a(a + b)* 

which we have already seen is a regular expression representing all words that have at least 
two a’s in them. 

Therefore, the language associated with E is the union of all strings that have three or 
more a’s with all strings that have two or more a’s. But since all strings with three or more 
a 's are themselves already strings with two or more a' s, this whole language is just the sec¬ 
ond set alone. 

The language associated with E is no different from the language associated with 


(a + b)*a(a + b)*a(a + b)* 


It is possible by repeated application of the rules for forming regular expressions to pro¬ 
duce an expression in which the star operator is applied to a subexpression that already has a 
star in it. 

Some examples are 


(a + b*)* (aa + ab*)* ((a + bbba*) + ba*b)* 


In the first of these expressions, the internal * adds nothing to the language 


(a + b*)* = (a + b)* 


since all possible strings of a’s and b's are described by both expressions. 
Also, in accordance with Theorem 1 on p. 18, 


(a*)* — a* 


(a*b*)* 


The language defined by this expression is all strings that can be made up of factors of the 
form a*b*, but since both the single letter a and the single letter b are words of the form a*b*, 
this language contains all strings of a’s and fr’s. It cannot contain more than everything, so 


(a*b*)* = (a + b)* 


b*(abb*)*(A + a) 


(a + b)*A(a + b)* 


which occurs in the middle of the second term, is only another way 
whatsoever” and could be replaced with the more direct expression 


of 


saying 


“any 


which we have examined before with three of its avatars. 
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problems 


One very interesting example, which we consider now in great detail and carry with us 
throughout the book, is 

E ~ |aa + bb + (ab + ba)(aa + bb)*(ab + ba)]* 

This regular expression represents the collection of all words that are made up of “syllables” 
of three types: 

type, - aa 
type 2 = bb 

type 3 = (ab + ba)(aa 4- bb)*(ab + ba) 

E = [type, + type 2 + type.,]* 

Suppose that we are scanning along a word in the language of E from left to right, read¬ 
ing the letters two at a time. First, we come to a double a (type,), then to a double b (type 2 ), 
then to another double a (type, again). Then perhaps we come upon a pair of letters that are 
not the same. Say, for instance, that the next two letters are ba. This must begin a substring 
of type 3 . It starts with an undoubled pair (either ab or ba ), then it has a section of doubled 
letters (many repetitions of either aa or bb), and then it finally ends with another undoubled 
pair (either ab or ba again). One property of this section of the word is that it has an even 
number of a' s and an even number of b' s, counting the two undoublcs and all the doubles. 
After this section of type 3 , we could proceed with more sections of type, or type, until we 
encountered another undoubled pair, starting another type 3 section. We know that another 
undoubled pair will be coming up to balance off the initial one. The total effect is that every 
word of the language of E contains an even number of a' s and an even number of b'\ s. 

If this were all we wanted to conclude, we could have done so more quickly. All words 
in the language of E are made up of these three types of substrings and, since each of these 
three has an even number of a’s and an even number of b's, the whole word must, too. How¬ 
ever, a stronger statement is also true. All strings with an even number of a' s and an even 
number of b' s belong to the language of E. The proof of this parallels our argument above. 

Consider a word w with even a' s and even b' s. If the first two letters are the same, we 
have a type, or type, syllable. Scan over the doubled letter pairs until we come to an un¬ 
matched pair such as ab or ba. Continue scanning by skipping over the double a' s and dou¬ 
ble b's that get in the way until we find the balancing unmatched pair (either ab or ba) to 
even off the count of a’ s and b's. If the word ends before we find such a pair, the a' s and h' s 
are not even. Once we have found the balancing unmatched pair, we have completed a sylla¬ 
ble of type 3 . By “balancing,” we do not mean it has to be the same unmatched pair: ab cm 
be balanced by either ab or ba. Consider them bookends or open and close parentheses; 
whenever we see one, we must later find another. Therefore, E represents the language of all 
strings with even a's and even b' s. 

Let us consider this as a computer algorithm. We are about to feed in a long string of a’’ s 
and /?’s, and we want to determine whether this string has the property that the number of a's 
is even and the number of b' s is even. One method is to keep two binary flags, the a flag and 
the b flag. Every time an a is read, the a flag is reversed (0 to 1, or 1 to 0); every time a b is 
read, the b flag is reversed. We start both flags at 0 and check to be sure they are both 0 at the 
end. This method will work. 

But there is another method that also works which uses only one flag—the method that 
corresponds to the discussion above. Let us have only one flag called the type 3 flag. We read 
the letters in two at a time. If they are the same, then we do not touch the type 3 flag, since we 
have a factor of type, or type,. If, however, the two letters read do not match, we throw the 
type 3 flag. If the flag starts at 0, then whenever it is 1, we are in the middle of a type 3 factor; 


I 




& 

gp 

1 




whenever it is 0, we are not. If it is 0 at the end, then the input string contains an even num¬ 
ber of a’s and an even number of b' s. 

For example, if the input is 

(aa)(ab)(bb)(ba)(ab)(bb)(bb)(bb)(ab)(ab)(bb)(ba)(aa) 

the flag is reversed six times and ends at 0. 

We will refer to this language again later, so we give it the name EVEN-EVEN. 

EVEN-EVEN = (A aa bb aaaa aabb abab abba baab baba 
bbaa bbbb aaaaaa aaaabb aaabab . . .} 

Notice that there do not have to be the same number of a’s and b's, just an even quantity 
of each. 





PROBLEMS 


gig33£g& 1. Let r,, r,, and r 3 be three regular expressions. Show that the language associated with 

(r, + r 2 )r 3 is the same as the language associated with r,r 3 + r 2 r 3 . Show that r,(r 2 + r 3 ) 
is equivalent to r,r 2 + r,r v This will be the same as proving a “distributive law” for reg- 
ular expressions. 


uiar expressions. 

For Problems 2 through 11, construct a regular expression defining each of the following 
languages over the alphabet £ — [a b }: 
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2. All words in which a appears tripled, if at all. This means that every clump of a’s con¬ 
tains 3 or 6 or 9 or 12 . . . a’s. 

3. All words that contain at least one of the strings s,, s T s y or s 4 . 

4. All words that contain exactly two b's or exactly three b's, not more. 

5. (i) All strings that end in a double letter. 

(ii) All strings that do not end in a double letter. 

6. All strings that have exactly one double letter in them. 

7. All strings in which the letter b is never tripled. This means that no word contains the 
substring bbb. 

8. All words in which a is tripled or b is tripled, but not both. This means each word con¬ 
tains the substring aaa or the substring bbb but not both. 

9. (i) All words that do not have the substring ab. 

(ii) All words that do not have both the substrings bba and abb. 

10. All strings in which the total number of a’s is divisible by 3 no matter how they are dis¬ 
tributed, such as aabaabbaba. 

11. (i) All strings in which any b's that occur are found in clumps of an odd number at a 

time, such as abaabbbab. 

(ii) All strings that have an even number of a's and an odd number of b's. 

(iii) All strings that have an odd number of a’s and an odd number of b's. 

12. (i) Let us reconsider the regular expression 




(a + b)*a(a + b)*b(a + b)* 










Show that this is equivalent to 

(a + b)*ab(a + b)* 

in the sense that they define the same language, 

(ii) Show that 


(a + b)*ab(a + b)* + b*a* = (a + b)* 


(iii) Show that 


(a 4 - b)*ab[(a + b)*ab(a + b)* + b*a*] + b*a* = (a + b)* 

(iv) Is (iii) the last variation of this theme or are there more beasts left in this cave? 

13. We have defined the product of two sets of strings in general. If we apply this to the case 

where both factors are the same set, S = T, we obtain squares, S 2 . Similarly, we can de¬ 
fine S' 3 , S 4 , ... . Show that it makes some sense to write: 

(i) S* = A + S + S l + S 2 + S 3 4- S 4 4- . . . 

(ii) S + = S + S l + S 2 4- S 3 4- S 4 + . . . 

14. If the only difference between L and L* is the word A, is the only difference between L 2 
and L* the word A? 

For Problems 15 through 17, show that the following pairs of regular expressions define the 
same language over the alphabet X = {a b}: 

15. (i) (ab)*a and a(ba)* 

(ii) (a* + b)* and (a + b)* 

(iii) (a* + b*)* and (a + b)* 

16. (i) A* and A 

(ii) (a*b)*a* and a*(ba*)* 

(iii) (a*bbb)*a* and a*(bbba*)* 

17. (i) ((a + bb)*aa)* and A + (a + bb)*aa 

(ii) (aa)*(A + a) and a* 

(iii) a(aa)*(A + a)b + b and a*b 

(iv) a(ba + a)*b and aa*b(aa*b)* 

(v) A + a(a + b)* + (a + b)*aa(a + b)* and ((b*a)*ab*)* 

18. Describe (in English phrases) the languages associated with the following regular ex¬ 
pressions: 

(i) (a + b)*a(A + bbbb) 

(ii) (a(a + bb)*)* 

(iii) (a(aa)*b(bb)*)* 

(iv) (b(bb)*)*(a(aa)*b(bb)*)* 

(v) (b(bb)*)*(a(aa)*b(bb)*)*(a(aa)*)* 

(vi) ((a + b)a)* 

19. (D. N. Arden) Let R, S , and T be three languages and assume that A is not in S. Prove 
the following statements: 

(i) From the premise that R - SR + T, we can conclude that R = S*T. 

(ii) From the premise that R = S*T , we can conclude that R = SR + T. 

20. (i) Explain why we can take any pair of equivalent regular expressions and replace the 

letter a in both with any regular expression R and the letter b with any regular ex- 
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Finite Automata 


YET ANOTHER METHOD FOR DEFINING LANGUAGES 


Several games that children play fit the following description. Pieces are set up on a playing 
board. Dice are thrown (or a wheel is spun), and a number is generated at random. Depend¬ 
ing on the number, the pieces on the board must be rearranged in a fashion completely speci¬ 
fied by the rules. The child has no options about changing the board. Everything is deter¬ 
mined by the dice. Usually, it is then some other child’s turn to throw the dice and make his 
or her move, but this hardly matters, because no skill or choice is involved. We could elimi¬ 
nate the opponent and have the one child move first the white pieces and then the black. 
Whether or not the white pieces win the game is dependent entirely on what sequence of; 
numbers is generated by the dice, not on who moves them. 

Let us look at all possible positions of the pieces on the board and call them states. The 
game changes from one state to another in a fashion determined by the input of a certain 
number. For each possible number, there is one and only one resulting state. We should allow 
for the possibility that after a number is entered, the game is still in the same state as it was 
before. (For example, if a player who is in “jail” needs to roll doubles in order to get out, any 
other roll leaves the board in the same state.) After a certain number of rolls, the board arrives 
at a state that means a victory for one of the players and the game is over. We call this a final 
state. There might be many possible final states that result in victory for this player. In com¬ 
puter theory, these are also called halting states, terminal states, or accepting states. 

Beginning with the initial state (which we presume to be unique), some input sequences 
of numbers lead to victory for the first child and some do not. 

Let us put this game back on the shelf and take another example. A child has a simple 
computer (input device, processing unit, memory, output device) and wishes to calculate the 
sum of 3 plus 4. The child writes a program, which is a sequence of instructions that are fed 
into the machine one at a time. Each instruction is executed as soon as it is read, and then the 
next instruction is read. If all goes well, the machine outputs the number 7 and terminates 
execution. We can consider this process to be similar to the board-game. Here the board is 
the computer and the different arrangements of pieces on the board correspond to the differ 
ent arrangements of 0’s and 1 ’s in the cells of memory. Two machines are in the same state if 
their output pages look the same and their memories look the same cell by cell. 

The computer is also deterministic, by which we mean that, on reading one particular 
input instruction, the machine converts itself from the state it was in to some particular other 
state (or remains in the same state if given a NO-OP), where the resultant state is completely 
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determined by the prior state and the input instruction. Nothing else. No choice is involved. 
No knowledge is required of the state the machine was in six instructions ago. Some se¬ 
quences of input instructions may lead to success (printing the 7) and some may not. Success 
is entirely determined by the sequence of inputs. Either the program will work or it will not. 

As in the case of the board-game, in this model we have one initial state and the possi¬ 
bility of several successful final states. Printing the 7 is what is important; what is left in 
memory does not matter. 

One small difference between these two situations is that in the child’s game the number 
of pieces of input is determined by whether either player has yet reached a final state, 
whereas with the computer the number of pieces of input is a matter of choice made before 
run time. Still, the input string is the sole determinant as to whether the game child or the 
computer child wins his or her victory. 

In the first example, we can consider the set of all possible dice rolls to be the letters of 
an alphabet. We can then define a certain language as the set of strings of those letters that 
lead to success, that is, lead to a final victory state. Similarly, in the second example we can 
consider the set of all computer instructions as the letters of an alphabet. We can then define 
a language to be the set of all words over this alphabet that lead to success. This is the lan¬ 
guage whose words are all programs that print a 7. 

The most general model, of which both of these examples are instances, is called a fi¬ 
nite automaton-—“finite” because the number of possible states and number of letters in the 
alphabet are both finite, and “automaton” because the change of states is totally governed by 
the input. The determination of what state is next is automatic (involuntary and mechanical), 
not willful, just as the motion of the hands of a clock is automatic, while the motion of the 
hands of a human is presumably the result of desire and thought. We present the precise defi¬ 
nition below. Automaton comes to us from the Greek, so its correct plural is automata. 

DEFINITION 

A finite automaton is a collection of three things: 

1. A finite set of states, one of which is designated as the initial state, called the start 

state, and some (maybe none) of which are designated as final states. 

2. An alphabet X of possible input letters. 

3. A finite set of transitions that tell for each state and for each letter of the input alphabet 

which state to go to next. ■ 

The definition above is incomplete in the sense that it describes what a finite automaton 
is but not how it works. It works by being presented with an input string of letters that it 
reads letter by letter starting at the leftmost letter. Beginning at the start state, the letters de¬ 
termine a sequence of states. The sequence ends when the last input letter has been read. 

Instead of writing out the whole phrase “finite automaton,” it is customary to refer to 
one by its initials, FA. Computer theory is rife with acronyms, so we have many in this book. 
The term FA is read by naming its letters, so we say “an FA” even though it stands for “a fi¬ 
nite automaton” and we say “two FAs” even though it stands for “two finite automata.” 

Some people prefer to call the object we have just defined a finite acceptor because its 
sole job is to accept certain input strings and reject others. It does not do anything like print 
output or play music. Even so, we shall stick to the terminology “finite automaton.” When 
we build some in Chapter 8 that do do something, we give them special names, such as “fi¬ 
nite automata with output.” 
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Let us begin by considering in detail one particular example. 

Suppose that the input alphabet has only the two letters a and b. Throughout this chap¬ 
ter, we use only this alphabet (except for a couple of problems at the end). Let us also as¬ 
sume that there are only three states, x, y, and z. Let the following be the rules of transition: 


Rule 1 

From 

state x and 

input a, go 

to state y. 

Rule 2 

From 

state x and 

input b, go 

to state z. 

Rule 3 

From 

state y and 

input a, go 

to state x. 

Rule 4 

From 

state y and 

input b, go 

to state z. 

Rule 5 

From 

state z and 

any input, stay at state z 


Let us also designate state jr as the starting state and state z as the only final state. 

We now have a perfectly defined finite automaton, because it fulfills all three require- 
ments demanded above: states, alphabet, transitions. 

Let us examine what happens to various input strings when presented to this FA. Let us ' 
start with the string aaa . We begin, as always, in state x . The first letter of the string is an a , j 
and it tells us to go to state y (by Rule 1). The next input (instruction) is also an a, and this 1 
tells us by Rule 3 to go back to state *. The third input is another a, and by Rule 1 again we J 
go to state y. There are no more input letters in the input string, so our trip has ended. We did | 
not finish up in the final state (state z), so we have an unsuccessful termination of our run. J 
The string aaa is not in the language of all strings that leave this FA in state z. The set of ja 
all strings that do leave us in a final state is called the language defined by the finite au- J 
tomaton. The input string aaa is not in the language defined by this FA. Using other termi- J 
nology, we may say that the string aaa is not accepted by this finite automaton because it 
does not lead to a final state. We use this expression often. We may also say, “ aaa is rejected | 
by this FA.” The set of all strings accepted is the language associated with the FA. We say, | 
“this FA accepts the language L,” or “L is the language accepted by this FA. When we J 
wish to be anthropomorphic, we say that L is the language of the FA. If language L x is con¬ 
tained in language L 9 and a certain FA accepts L 2 (all the words in L 2 are accepted and all the | 
inputs accepted are words in L 0 ), then this FA also must accept all the words in language L H 
(because they are also words in L 2 ). However, we do not say, “Lj is accepted by this FA be¬ 
cause that would mean that all the words the FA accepts are in L,. This is solely a matter of * 

standard usage. 4 

At the moment, the only job an FA does is define the language it accepts, which is a fine 
reason for calling it an acceptor, or better still a language-recognizer. This last term is good , 
because the FA merely recognizes whether the input string is in its language much the same | 
way we might recognize when we hear someone speak Russian without necessarily under- i 

standing what it means. | 

Let us examine a different input string for this same FA. Let the input be abba. As al¬ 
ways, we start in state x. Rule 1 tells us that the first input letter, a , takes us to state y. Once ^ 
we are in state y, we read the second input letter, which is a b. Rule 4 now tells us to move to,^ 
state z. The third input letter is a b , and because we are in state z, Rule 5 tells us to stay there. * 
The fourth input letter is an a, and again Rule 5 says stay put. Therefore, after we have fol- J 
lowed the instruction of each input letter, we end up in state z. State z is designated a final | 
state, so we have won this game. The input string abba has taken us successfully to the final 
state. The string abba is therefore a word in the language associated with this FA. The word| 

abba is accepted by this FA. * i 

It is not hard for us to predict which strings will be accepted by this FA. If an input j 
string is made up of only the letter a repeated some number of times, then the action of the 4 
FA will be to jump back and forth between state a and state y. No such word can ever be ac¬ 
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The list of transition rules can grow very long. It is much simpler to summarize them in 
a table format. Each row of the table is the name of one of the states in the FA, and each col¬ 
umn of the table is a letter of the input alphabet. The entries inside the table are the new 
states that the FA moves into—the transition states. The transition table for the FA we have 
described is 


We have also indicated along the left side which states are start and final states. This 
(able has all the information necessary to define an FA. 

Instead of the lengthy description of the meaning of motion between states caused by 
input letters, FAs could simply and equivalently have been defined as static transition tables. 
Any table of the form 


in which the dots are filled with the letters a, y, and z in any fashion, and which specifies the 
start state and the final states, will be an FA. Similarly, every three-state FA corresponds to 
such a table. 

Even though it is no more than a table of symbols, we consider an FA to be a machine, 
that is, we understand that this FA has dynamic capabilities. It moves. It processes input. 
Something goes from state to state as the input is read in and executed. We may imagine that 
the state we are in at any given time is lit up and the others are dark. An FA running on an 
input string then looks like a pinball machine in operation. 

We may make the definition of FAs even more mathematically abstract (with no 
greater precision and decreased understanding) by replacing the transition table with a 
total function whose input is a pair of state and alphabet letter and whose output is a sin¬ 
gle state. This function is called the transition function, usually denoted 6 (lowercase 
Greek delta) (for reasons lost to computer historians). The abstract definition of an FA 
is then: 


Yet Another Method for Defining Languages 


cepted. To get into state z, it is necessary for the string to have the letter b in it. As soon as a 
b is encountered in the input string, the FA jumps immediately to state z no matter what state 
it was in before. Once in state z, it is impossible to leave. When the input string runs out, the 
FA will still be in state z, leading to acceptance of the string. 

The FA above will accept all strings that have the letter b in them and no other strings. 
Therefore, the language associated with (or accepted by) this FA is the one defined by the 
regular expression 


1 . 

2 . 

3. 


A finite set of states Q = {q 0 q A q 2 - • •} of which q 0 is the start state. 
A subset of Q called the final states. 

An alphabet^ ={jc, x 2 x 3 . . .}. 


(a + b)*b(a + b)* 


Start x 

y 

Final z 
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Uq r Xj) = x k 

We shall never refer to this transition function again in this volume. 

From the table format, it is hard to see the moving parts. There is a pictorial representa- 
tion of an FA that gives us more of a feel for the motion. We begin by representing each states 
by a small circle drawn on a sheet of paper. From each state, we draw arrows showing to 
which other states the different letters of the input alphabet will lead us. We label these ar 
rows with the corresponding alphabet letters. 

If a certain letter makes a state go back to itself, we indicate this by an arrow that re 
turns to the same circle—this arrow is called a loop. We can indicate the start state by label 
ing it with the word “start” or by a minus sign, and the final states by labeling them with th 
word “final” or plus signs. Notice that some states are neither - nor +. The machine w 
have already defined by the transition list and the transition table can be depicted by the 
transition diagram 


4. A transition function 8 associating each pair of state and letter with a state. 


Every input string can be interpreted as traversing a path beginning at the start state and 
moving among the states (perhaps visiting the same state many times) and finally settling in 
some particular rest state. If it is a final state, then the path has ended in success. The letters 
of the input string dictate the directions of travel. They are the directions and the fuel needed 
for motion. When we are out of letters, we must stop. 

Let us look at this machine again and at the paths generated by the input strings 

aaaabba and bbaabbbb. 


Sometimes, a start state is indicated by an arrow and a final state by drawing a box or an¬ 
other circle around its circle. The minus and plus signs, when employed, are drawn inside or 
outside the state circles. This machine can also be depicted as 


Even though we do not have names for the states, we can still determine whether a par¬ 
ticular input string is accepted by this machine. We start at the minus sign and proceed along 
the indicated edges until we are out of input letters. If we are then at a plus sign, we accept 
the word; if not, we reject it as not being part of the language of the machine. ‘ 

Let us consider some more simple examples of FAs. 


When we depict an FA as circles and arrows, we say that we have drawn a directed 
graph. Graph theory is an exciting subject in its own right, but for our purposes there is no 
real need to understand directed graphs in any deeper sense than as a collection of circles 
and arrows. We borrow from graph theory the name directed edge, or simply edge, for the 
arrow between states. An edge comes from one state and leads to another (or the same, if it 
is a loop). Every state has as many outgoing edges as there are letters in the alphabet. It is 
possible for a state to have no incoming edges or to have many. 

There are machines for which it is not necessary to give the states specific names. For 
example, the FA we have been dealing with so far can be represented simply as 


bbaabbbb 









EXAMPLE 


Here, the sign ± means that the same state is both a start and a final state. Because there is 
only one state and no matter what happens we must stay there, the language for this machine is 

(a + b)* ® 


Let us build a machine that accepts the language of all words over the alphabet \a b } with 
an even number of letters. We can start our considerations with a human algorithm for iden¬ 
tifying all these words. One method is to run our finger across the string from left to right 
and count the number of letters as we go. When we reach the end of the string, we examine 
the total and we know right away whether the string is in the language or not. This may be 
the way a mathematician would approach the problem, but it is not how a computer scientist 
would solve it. Because we are not interested in what the exact length of the string is, this 
number represents extraneous information gathered at the cost of needlessly many calcula- 
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EXAMPLE 


In the picture above, we have drawn one edge from the state on the right back into itself and J 
given this loop the two labels a and b , separated by a comma, meaning that this is the path J 
traveled if either letter is read. (We save ourselves from drawing a second loop edge.) We| 
could have used the same convention to eliminate the need for two edges running from the .I 
minus state to the plus state. We could have replaced these with one edge with the label a,b;j 
but we did not. At first glance, it looks as if this machine accepts everything. The first 
of the input takes us to the right-hand state and, once there, we are trapped forever. When the j 
input string runs out, there we are in the correct final state. This description, however, omits j 
the possibility that the input is the null string A. If the input string is the null string, we are j 
left in the left-hand state, and we never get to the final state. There is a small problem about| 
understanding how it is possible for A ever to be an input string to an FA, because a string, J 
by definition, is executed (run) by reading its letters one at a time. By convention, we shall j 
say that A starts in the start state and then ends right there on all FAs. -j 

The language accepted by this machine is the set of all strings except A. This has the,^ 

regular expression definitions 

(a + b)(a + b)* - (a + b) + 


Similarly, there are FAs that accept no language. These are of two types: FAs that have 
no final states, such as 


EXAMPLE 


One of the many FAs that accepts all words is 


and FAs in which 
state. This may be 


the circles that represent the final states cannot be reached from 
either because the picture is in two separate components as with 


the st! 


' FAS and Their Languages 


- ' (i n this case, we say that the graph is disconnected), or for a reason such as that shown be- 
j/T ‘ low: 

tfE?* j||fe a, b 


4 FAs AND THEIR LANGUAGES 


We consider these examples again in Chapter 11. 


It is possible to look at the world of FAs in two ways. We could start with the machine and 
try to analyze it to see what language it accepts, or we could start with a desired language in 
our mind and try to construct an FA that would act as a language-recognizer or language- 
definer. Needless to say, in real life we seldom discover an FA falling out of a cereal box or 
etched onto a mummy’s sarcophagus; it is usually our desire to construct an FA from scratch 
for the precise purpose of acting as a language-recognizer for a specific language for which 
we were looking for a practical algorithmic definition. 
a When a language is defined by a regular expression, it is easy to produce some arbitrary 
words that are in the language by making a set of choices for the meaning of the pluses and 
stars, but it is harder to recognize whether a given string of letters is or is not in the language 
defined by the expression. The situation with an FA is just the opposite. If we are given a 
specific string, we can decide by an algorithmic procedure whether or not it is in the lan¬ 
guage defined by the machine—just run it and see if the path it determines ends in a final 
state. On the other hand, given a language defined by an FA, it is not so easy to write down a 
bunch of words that we know in advance the machine will accept. 

Therefore, we must practice studying FA from two different angles: Given a language, 
can we build a machine for it, and given a machine, can we deduce its language? 
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tions. A good programmer would employ instead what is called a Boolean flag; let us call iti 
E for even. If the number of letters read so far is indeed even, then E should have the value j 
TRUE. If the number of letters read is not even, then E should have the value FALSE. Ini-^ 
tially, we set E equal to TRUE, and every time we read a letter, we reverse the value of E un¬ 
til we have exhausted the input string. When the input letters have run out, we check then 
value of E. If it is TRUE, then the input string is in the language; if false, it is not. 

The program looks something like this: 

set E - TRUE 
while not out of data do 
read an input letter 
E becomes not(E) 

if E = TRUE, accept the input string 
else reject the string 

Because the computer employs only one storage location in the processing of this program 
and that location can contain only one of two different values, the finite automaton for this; 
language should require only two states: 

State 1 E is TRUE; this is the start state and the accept or final state. 

State 2 £ is FALSE. 

Every time an input letter is read, whether it is an a or a b , the state of the FA changes. This; 
machine is pictured below: 


|phe same language may be accepted by a four-state machine, as below 


Only the word a ends in the first + state. All other words starting with an a reach and finish 
in the second + state where they are accepted. 

This idea can be carried further to a five-state FA as below: 


EXAMPLE 


The examples above are FAs that have more than one final state. From them, we can 
see that there is not a unique machine for a given language. We may then ask the ques¬ 
tion, “Is there always at least one FA that accepts each possible language? More pre¬ 
cisely, if L is some language, is there necessarily a machine of this type that accepts 
exactly the inputs in L, while forsaking all others?” We shall see shortly that this question 
is related to the question, “Can all languages be represented by regular expressions?” 

We shall prove, in Chapter 7, that every language that can be accepted by an FA can 
be defined by a regular expression and, conversely, every language that can be defined 
by a regular expression can be accepted by some FA. However, we shall see that there 
are languages that are neither definable by a regular expression nor accepted by an 
FA. Remember, for a language to be the language accepted by an FA means not only that 
all the words in the language run to final states, but also that no strings not in the 
language do. 

Let us consider some more examples of FAs. 









EXAMPLE 


Consider the FA pictured below: 


Before we begin to examine what language this machine accepts, let us trace the paths asso¬ 
ciated with some specific input strings. Let us input the string ababa . We begin at the start 
state 1. The first letter is an a, so it takes us to state 2. From there the next letter, b , takes us 
to state 3. The next letter, a , then lakes us back to state 2. The fourth letter is a b and that 
takes us to state 3 again. The last letter is an a that returns us to state 2 where we end. State 2 
is not a final state (no + ), so this word is not accepted. 

Let us trace the word babbb. As always, we start in state 1. The first letter, b, takes us to 
state 3. An a then takes us to state 2. The third letter, b, takes us back to state 3. Now another b 
takes us to state 4. Once in state 4, we cannot get out no matter what the rest of the string is. 
Once in state 4, we must stay in state 4, and because that is the final state, the string is accepted. 

There are two ways to get to state 4 in this FA. One is from state 2, and the other is from 
state 3. The only way to get to state 2 is by reading the input letter a (while in either state 1 
or state 3). So when we are in state 2, we know we have just read an a. If we read another a 
immediately, we go straight to state 4. it is a similar situation with state 3. To get to state 3, 
we need to read a b . Once in state 3, if we read another b immediately, we go to state 4; oth¬ 
erwise, we go to state 2. 

Whenever we encounter the substring aa in an input string, the first a must take us to 
state 4 or 2. Either way, the next a takes us to state 4. The situation with bb is analogous, if 
we are in any of the four states l, 2, 3, or 4 and we read two a' s, we end up in state 4. If we 
are in any state and read two b' s, we end up in state 4. State 4, once entered, cannot be left. 
To end in state 4, we must read a double letter. 

in summary, the words accepted by this machine are exactly those strings that have a 
double letter in them. This language, as we have seen, can also be defined by the regular ex¬ 
pression 

(a + b)*(aa + bb)(a + b)* 

The four states in this machine can be characterized by the purposes they serve: 

State 1 Start here but do not get too comfortable; you are going to leave immediately. 

State 2 We have just read an a that was not preceded by an a and we are looking for a 
second a as the next input. 

State 3 We have just read a b that was not preceded by a h and we are looking for a 
second b as the next input. 

State 4 We have already discovered the existence of a double letter in the input string 
and we are going to wait out the rest of the input sequence and then announce 
acceptance when it is all over. 
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EXAMPLE 


FAs and Their Languages 


Let us contemplate the possibility of building an FA that accepts all words containing a triple 

letter, either aaa or bbb, and only those words. 

The machine must have a start state. From the start state, it must have a path of three 
edges, with no loop, to accept the word aaa. Therefore, we begin our machine with 


For similar reasons, we can deduce that there must be a path for bbb, that has no loop, 
and uses entirely different states. If the /,-path shared any of the same states as the a 
path, we could mix a’s and b’s and mistakenly get to + anyway. We need only two 
additional states because the paths could share the same final state without a problem, as 
below: 


If we are moving anywhere along the a-path and we read a b before the third a, we jump tp 
the b-path in progress and vice versa. The whole FA then looks like this. 


We can understand the language and functioning of this FA because we have seen how 
was built. If we had started with the final picture and tried to interpret its meaning, we woui 
be sailing uncharted waters. 
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Siting at the start state, anything but the sequence baa will drop down into the collecting 
icket at the bottom, never to be seen again. Even the word baabb will fail. It will reach the 
g na l state marked with a +, but then the next letter will suicide over the edge. 

The language accepted by this FA is 

L = {baa} ■ 


The FA below accepts exactly the two strings baa and air. 


EXAMPLE 

Let us take a trickier example. Consider the FA shown below: 


What is the language accepted by this machine? We start at state 1, and if we are read- 
tag a word starting with an a, we go straight to the final state 3. We can stay at state 3 as 
l°ng as we continue to read only a' s. Therefore, all words of the form 
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In this characterization, if we read a b while in state 2, we go to state 3, hoping for another 
whereas if we read an a in state 3, we go to state 2, hoping for a baby a. 


EXAMPLE 


Big machine, small language. 


Let us consider the FA pictured below: 


This machine will accept all words with b as the third letter and reject all other words. State 
1 and 2 are only waiting states eating up the first two letters of input. Then comes the de| 
sion at state 3. A word that has fewer than three letters cannot qualify, and its path ends 
one of the first three states, none of which is designated +. Once we get to state 3, only 
low road leads to acceptance. 

Some regular expressions that define this language are 


(aab + abb + bab + bbb)(a + b)* 


(a + b)(a + b)(b)(a + b)* =(a + b)^b(a + b)* 


Notice that this last formula is not, strictly speaking, a regular expression, because 
uses the symbol 2, which is not included in the kit. 


EXAMPLE 


Let us consider a very specialized FA, one that accepts only the word baa : 
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are accepted by this machine. What if we began with some a's that take us to state 3 but the#! 
we read a bl This then transports us to state 2. To get back to the final state, we must P r o«9 
ceed to state 4 and then to state 3. These trips require two more b's to be read as input. NoM 
tice that in states 2, 3, and 4 all a ':s that are read are ignored. Only b 's cause a change q|| 

state. jj| 

Recapitulating what we know: If an input string begins with an a and then has some b\M 

it must have 3 b's to return us to state 3, or 6 b's to make the trip (state 2, state 4, state 3'j|| 
twice, or 9 b's , or 12 b's and so on. In other words, an input string starting with an a an* 
having a total number of b's divisible by 3 will be accepted. If it starts with an a and has M 
total number of b's not divisible by 3, then the input is rejected because its path through the! 

machine ends at state 2 or 4. : ^jp| 

What happens to an input string that begins with a bl It finds itself in state 2 and needsJ| 
two more b's to get to state 3 (these b's can be separated by any number of a's). Once ia| 
state 3, it needs no more b' s, or three more b's , or six more b s, and so on. ||| 

All in all, an input string, whether beginning with an a or a b, must have a total numberjl 
of b's divisible by 3 to be accepted. It is also clear that any string meeting this requirement! 

will reach the final state. I 

The language accepted by this machine can be defined by the regular expression 

a*(a*ba*ba*ba*)*(a + a*ba*ba*ba*) 

The only purpose for the last factor is to guarantee that A is not a possibility because it is not| 
accepted by the machine. If we did not mind A being included in the language, we couldj 
have used this simpler FA: >Jjl 



The regular expression 

(a + ba*ba*b) + 

also defines the original (non-A) language, whereas the regular expression 

(a*ba*ba*ba*)* 

defines the language of the second machine. 


EXAMPLE 


The following FA accepts only the word A: 
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JjgJ/J’, This can be better understood by examining the path through the FA of the input string 

BB8BK | abbaabb, as shown below: 

2y s- 

*' ' •*' jSSfelt 3 


EXAMPLE 


Consider the following FA 


This machine will accept the language of all words with a double a in them somewhere^ 
stay in the start state until we read our first a. This moves us to the middle state. If M 
' next letter is another a, we move to the + state, where we must stay and eventually 
; pted. If the next letter is a b, however, we go back to - to wait for the next a. 

We can identify the purposes that these states serve in the machine as follows: 

Start state The previous input letter (if there was one) was not an a. 

Middle state We have just read an a that was not preceded by an a. 

Final state We have already encountered a double a and we are going to sit here urji 
til the input is exhausted. 

Clearly, if we are in the start state and we read an a, we go to the middle state, but if m 
I a b, we stay in the start state. When in the middle state, an a sends us to nirvana, whefg 
mate acceotance awaits us, whereas a b sends us back to start, hoping for the first a of i 


It will be useful for us to consider this FA as having a primitive memory device. For the 
top two states, no matter how much bouncing we do between them, remember that the first 
letter read from the input string was an a (otherwise, we would never have gotten up here to 
begin with). For the bottom two states, remember that the first input letter was a b. 

Lower non + state The input started with a b and the last letter we have read from the 
input string is also a b. 

Lower + state The input started with a b and the last letter read so far is an a. ■ 


EVEN-EVEN REVISITED 


EXAMPLE 


EXAMPLE 


As the next example of an FA in this chapter, let us consider the picture below 


The following FA accepts all words that have different first and last letters. If the word 
gins with an a, to be accepted it must end with a b and vice versa. 


To process a string of letters, we start at state 1, which is in the upper left of the picture. 
Every time we encounter a letter a in the input string, we take an a train. There are four 
edges labeled a. All the edges marked a go either from one of the upper two states (states 1 
and 2) to one of the lower two states (states 3 and 4), or else from one of the lower two states 


If we start with an a, we take the high road and jump back and 
top states ending on the right (at +) only if the last letter read is a b. 
a b, we go south. Here, we get to the H~ on the bottom only when we 
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Notice how much easier it is to understand the FA than the regular expression. B< 
methods of defining languages have advantages, depending on the desired application, 
in a theory course we rarely consider applications except in the following example. 


EXAMPLE 

We are programmers hired to write a word processor. As part of this major program, Wj 
must build a subroutine that scans any given input string of English letters and spaces and! 
cates the first occurrence of the substring cat whether it is a word standing alone or part of; 
longer word such as abdicate. 

We envision the need for four states: 


State 1 We have not just read a c; this is the start state. 

State 2 The last letter read was a c. 

State 3 The last letter read was an a that came after a c. 

State 4 We have just encountered the substring cat and control of this program m 
transfer somewhere else. 

If we are in state 1 and read anything but a c, we stay there. In state 1 if we read a c, J|| 
go unconditionally to state 2. 


to one of the upper two states. If we are north and we read an a , we go south. If we are soi 
and we read an a, we go north. The letter a reverses our up/down status. 

What happens to a word that gets accepted and ends up back in state 1? Without kr 
ing anything else about the string, we can say that it must have had an even number of a 
it. Every a that took us south was balanced by some a that took us back north. We cro 
the Mason—Dixon line an even number of times, one for each a. So, every word in the 
guage of this FA has an even number of a ’s in it. Also, we can say that every input str 
with an even number of a y s will finish its path in the north (state 1 or 2). 

There is more that we can say about the words that are accepted by this mac! 
There are four edges labeled b. Every edge labeled b either takes us from one of the t 
states on the left of the picture (states 1 and 3) to one of the two states on the right (stat 
2 and 4), or else takes us from one of the two states on the right to one of the two stat| 
on the left. Every b we encounter in the input is an east/west reverser. If the word sb^ 
out in state 1, which is on the left, and ends up back in state 1 (on the left), it must hi 
crossed the Mississippi an even number of times. Therefore, all the words in the langu; 
accepted by this FA have an even number of b 's as well as an even number of as. We 
also say that every input string with an even number of b '\s will leave us in the west (; 

1 or 3). 

These are the only two conditions on the language. All words with an even numb 
a y s and an even number of b y s must return to state 1. All words that return to state 1 a 
EVEN-EVEN. All words that end in state 2 have crossed the Mason-Dixon line an 
number of times but have crossed the Mississippi an odd number of times; therefore, 
have an even number of a ’s and an odd number of b' s. All the words that end in state 3 hi 
an even number of b y s but an odd number of a' s. All words that end in state 4 have an 
number of a 's and an odd number of b's. So again, we see that all the EVEN-EVEN w< 
must end in state 1 and be accepted. 

One regular expression for the language EVEN-EVEN was discussed in detail in 
previous chapter. 


—mipJ If W e are in state 2 and we read an a , we go to state 3. If we read another c, we stay in 

m gj state 2 because this other c may be the beginning of the substring cat. If we read anything 
f ~j se? W e go back to state 1. 

' if we are in state 3 and we read a t, then we go to state 4. If we read any other letter ex- 
cept C, we have to go back to state 1 and start all over again, but if we read a c, then we go to 
state 2 because this could be the start of something interesting. 

The machine looks like this: 

& all except c c any letter 

it! r~Q « . nQ « . rO 


all except c and t 


The input Boccaccio will go through the sequence of states 1-1-1-2-2-3-2-2-1-1 and the 
■% " ' input will not be accepted. 

The input desiccate will go through the states: 1 -1 -1 -1 -1 -2-3-4-4 and terminate (which 
jpj#i this example is some form of acceptance) before reading the final e. ■ 


1. Write out the transition tables for the FAs on pp. 56, 58 (both), 63, 64, and 69 that were 
defined by pictures. 

2. Build an FA that accepts only the language of all words with b as the second letter. 
Show both the picture and the transition table for this machine and find a regular expres¬ 
sion for the language. 

3. Build an FA that accepts only the words baa, ab , and abb and no other strings longer or 
shorter. 

4. (i) Build an FA with three states that accepts all strings. 

(ii) Show that given any FA with three states and three +’s, it accepts all input strings. 

(iii) If an FA has three states and only one +, must it reject some inputs? 

5. (i) Build an FA that accepts only those words that have more than four letters. 

(ii) Build an FA that accepts only those words that have fewer than four letters. 

(iii) Build an FA that accepts only those words with exactly four letters. 

6. Build an FA that accepts only those words that do not end with ba. 

7. Build an FA that accepts only those words that begin or end with a double letter. 

8. Build an FA that accepts only those words that have an even number of substrings ab. 

9. (i) Recall from Chapter 4 the language of all words over the alphabet {a b\ that 

have both the letter a and the letter b in them, but not necessarily jn that order. 
Build an FA that accepts this language. 
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(ii) Build an FA that accepts the language of all words with only a's or only b’s 
them. Give a regular expression for this language. 

10. Consider all the possible FAs over the alphabet {a b} that have exactly two stati 
An FA must have a designated start state, but there are four possible ways to plai 
the +’s: 


45. Build a machine that accepts all strings that have an even length that is not divisible 


16. Build an FA such that when the labels a and b are swapped the new machine is different 
from the old one but equivalent (the language defined by these machines is the same). 

17. Describe in English the languages accepted by the following FAs: 


Each FA needs four edges (two from each state), each of which can lead to either of thi 
states. There are 2 4 = 16 ways to arrange the labeled edges for each of the four types 6 
FAs. Therefore, in total there are 64 different FAs of two states. However, they do fi| 
represent 64 nonequivalent FAs because they are not all associated with different la! 
guages. All type 1 FAs do not accept any words at all, whereas all FAs of type 4 accef 
all strings of a’s and b's. 

(i) Draw the remaining FAs of type 2. 

(ii) Draw the remaining FAs of type 3. 

(iii) Recalculate the total number of two-state machines using the transition table defitl 

ition. | 


11. Show that there are exactly 5832 different finite automata with three states x, y, z ov| 
the alphabet {a />}, where x is always the start state. 

12. Suppose a particular FA, called FIN, has the property that it had only one final state th| 

was not the start state. During the night, vandals come and switch the + sign with the -3 
sign and reverse the direction of all the edges. || 

(i) Show that the picture that results might not actually be an FA at all by giving t| 
example. 

(ii) Suppose, however, that in a particular case what resulted was, in fact, a perfectly 
good FA. Let us call it NIE Give an example of one such machine. 

(iii) What is the relationship between the language accepted by FIN and the languagf 
accepted by NIF as described in part (ii)? Why? 

(iv) One of the vandals told me that if in FIN the plus state and the minus state were tij 
same state, then the language accepted by the machine could contain only palitii 
dromic words. Defeat this vandal by example. 

13. We define a removable state as a state such that if we erase the state itself and the edg| 
that come out of it, what results is a perfectly good -looking FA. 

(i) Give an example of an FA that contains a removable state. 

(ii) Show that if we erase a removable state the language defined by the reduced FA| 

exactly the same as the language defined by the old FA. Jj 

14. (i) Build an FA that accepts the language of all strings of a s and b's such that p| 

next-to-last letter is an a. 

(ii) Build an FA that accepts the language of all strings of length 4 or more such th| 
the next-to-last letter is equal to the second letter of the input string. 


(iv) Write regular expressions for the languages accepted by these three machines 
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18. The following is an FA over the 
that have an odd number of occi 



19. Consider the following FA: 


blems 
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(i) Show that any input string with more than three letters is not accepted by this FA. 

(ii) Show that the only words accepted are a, aab, and bah. 

(iii) Show that by changing the location of + signs alone, we can make this FA accept 
the language {bb aba bba). 

(iv) Show that any language in which the words have fewer than four letters can be 
accepted by a machine that looks like this one with the + signs in different 
places. 

(v) Prove that if L is a finite language, then there is some FA that accepts L extending 
the binary-tree part of this machine several more layers if necessary. 

20. Let us consider the possibility of an infinite automaton that starts with this infinite bi¬ 
nary tree: 



Let L be any infinite language of strings of a\ and b 's whatsoever. Show that by the ju¬ 
dicious placement of + \s, we can turn the picture above into an infinite automaton to 
accept the language L. Show that for any given finite string, we can determine from this 
machine, in a finite time, whether it is a word in L. Discuss why this machine would not 
be a satisfactory language-definer for L. 






CHAPTER 6 


Transition Graph; 


RELAXING THE RESTRICTION ON INPUTS 

We saw in the last chapter that we could build an FA that accepts only the word baa . The ejj 
ample we gave required five states primarily because an FA can read only one letter from jj 
input string at a time. Suppose we designed a more powerful machine that could read eithij 
one or two letters of the input string at a time and could change its state based on this inpt 
information. We might design a machine like the one below: 



Because when we say “build a machine,” all we have to do is scribble on paper—we 
not have to solder, weld, and screw—we could easily change the rules of what constitutes 
machine and allow such pictures as the one above. The objects we deal with in this book $ 
only mathematical models. In general, practically anything can be a mathematical model 
long as it is a well-defined set of rules for playing with some abstract constructs, but the 
vious question remains: a mathematical model of what? 

The FAs defined in the previous chapter started out on a dubious note when they we 
analogized to being mathematical models of children’s games. However, we did later pi 
duce some reasons for thinking that they were of use to computer science because they r 
resent, in a meaningful way, states in certain programmable algorithms. The mathematic 
models that we shall introduce in this chapter will differ in a significant way. We cannot 
of yet explain the direct application of these entities to the normal experience of a progr 
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ming student. That does not mean that their importance must be accepted on blind faith— 
merely patience. They will be of utmost practical value for us in the all-important next chap- 
ler. Beyond that service, the underlying special features that distinguish them from FAs will 
™" introduce us to a theme that will recur often in our study of computer theory. As for the mo- 
■ ment, we are proposing to investigate a variation of FAs. There are still states and edges that 
consume input letters, but we have abandoned the requirement that the edges eat just one let¬ 
ter at a time. As we shall see soon, this is accompanied by several other coordinated adjust¬ 
ments. 

If we are interested in a machine that accepts only the word baa, why stop at assuming 
ithat the machine can read just two letters at a time? A machine that accepts this word and 
that can read up to three letters at a time from the input string could be built with even fewer 

states: 
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/ If we interpret the picture on the right as an FA-like machine, we see that not only does 
baa alone get to the final state, but all other input strings end up actually nowhere. If we start 
- in the minus state and the first letter of the input is an a, we have no direction as to what to 
, _ do. The picture on the left at least tells us that when the input fails to be of the desired form, 
we must go to the garbage collection state and read through the rest of the input string in the 
full knowledge that we can never leave there. 

^ The picture on the right gives us another problem with the input baabb. The first three 
letters take us to the accept state, but then something undetermined (presumably bad) hap¬ 
pens when we read any more of the input letters. According to the rules of FAs, one cannot 

* stop reading input letters until the input string completely runs out. The picture on the right 
. does not tell us where to go for most of the situations we may have to face while reading in¬ 
puts. By convention, we shall assume that there is associated with the picture, but not drawn, 
some trash-can state that we must go to when we fail to be able to make any of the allowable 
indicated legal edge crossings in the picture. Once in this state, we must abandon all hope of 

• ever leaving and getting to acceptance. Many of the FAs in the previous chapter had such in¬ 
escapable nonacceptance black holes that had to be drawn in detail. We now consider the 
two pictures above to be equivalent for all practical purposes. They are only distinguishable 
m trivial ways, such as by having a different number of states, but they accept the exact same 
.language. 

§~ Rather than an imaginary hell-state as we have described just now, it is more stan- 
. dard to introduce a new term to describe what happens when an input is running on a ma¬ 
chine and gets into a state from which it cannot escape though it has not yet been fully 

read. 


m 
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DEFINITION 


When an input string that has not been completely read reaches a state (final or otherwise) th 
cannot leave because there is no outgoing edge that it may follow, we say that the input (or 
machine) crashes at that state. Execution then terminates and the input must be rejected. 


Let us make note of the fact that on an FA it is not possible for any input to crash 
cause there is always an outgoing a-edge and an outgoing 6-edge from each state. As Ion 
there remain letters unread, progress is possible. 

There are now two different ways that an input can be rejected: It could peacefully 
a path ending a nonfinal state, or it could crash while being processed. These two diffe 
ways of being unsuccessful are the experience of all programmers. 

If we hypothesize that a machine can read one or two letters at a time, then one c 
built using only two states that can recognize all words that contain a double letter: 


m n 



If we are going to bend the rules to allow for a machine like the last one, we must 
ize that we have changed something more fundamental than just the way the edges a 
beled or the number of letters read at a time. This last machine makes us exercise 
choice in its running. We must decide how many letters to read from the input string 
time we go back for more. This decision is quite important. 

Let us say, for example, that the input string is baa. It is easy to see how this string 
be accepted by this machine. We first read the letter b, which leaves us back at the start 
by taking the loop on the left. Then we decide to read both letters aa at once, which al 
us to take the highway to the final state where we end. However, if after reading the si 
character b, we then decided to read the single character a , we would loop back and be s 
at the start state again. When the third letter is read, we would still be at the starting pos 
could not then accept this string. There are two different paths that the input baa can 
through this machine. This is totally different from the situation we had before, espec 
because one path leads to acceptance and one to rejection. 

Another bad thing that might have happened is that we could have started processin 
string baa by reading the first two letters at once. Because ba is not a double letter, we 
not move to the final state. In fact, when we read ba , no edge tells us where to go, be 
ba is not the label of any edge leaving the start state. The processing of this string b 
down at this point and the machine crashes. So, there is the inherent possibility of rea 
variable amounts of letters from the input at each state. Therefore, the input string can fo 
a variety of paths through the machine, differing not only in their edge-length but al 
their final disposition. Some paths may lead to acceptance the usual way and some to 
tion two ways: either by ending in a nonfinal state or by causing the whole machine to 
What shall we say? Is this input string part of the language of this machine or not? It c 
be made to depend on the cleverness or whim of the machine operator and the number o 
ters he or she feels like inputting at each state—it must be an absolute yes or no, or els 
language is not well defined in the sense that we have been using. 

The result of these considerations is that if we are going to change the definition of 
abstract machine to allow for more than one letter to be read at a time, we must also ch 
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definition of acceptance. We shall say that a string is accepted by a machine if there is 
me way it could be processed so as to arrive at a final state. There may also be ways in 
diich this string does not get to a final state, but we ignore all failures. 

We are about to create machines in which any edge in the picture can be labeled by any 
ing of alphabet letters, but first we must consider some additional consequences. We could 
ow encounter the following problem: 


j|i|g 
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this machine, we can accept the word baab in two different ways. First, we could take ba 
i the start state to state 1 and then ab would take us to the final state. Or else we could 
i the three letters baa and go to state 2 from which the final letter, b, would take us to the 
Inal state. 

Previously, when we were dealing only with FAs, we had a unique path through the ma¬ 
chine for every input string. Now some strings have no paths at all, while some have several. 
We now have observed many of the difficulties inherent in expanding our definition of 
hine” to allow word-labeled edges (or, equivalently, to reading more than one letter of 
ut at a time). We shall leave the definition of the finite automaton alone and call these 
w machines transition graphs because they are more easily understood when defined di¬ 
rectly as graphs than as tables later turned into pictures. 


Hb 


DEFINITION 


A transition graph, abbreviated TG, is a collection of three things: 

1. A finite set of states, at least one of which is designated as the start state (—) and some 
(maybe none) of which are designated as final states ( + ). 

2. An alphabet 2 of possible input letters from which input strings are formed. 

3. A finite set of transitions (edge labels) that show how to go from some states to some 

others, based on reading specified substrings of input letters (possibly even the null 
string A). ■ 

When we give a pictorial representation of a transition graph, clause 3 in the definition 
means that every edge is labeled by some string or strings of letters, not necessarily only one 
letter. We are also not requiring that there be any specific number of edges emanating from 
any state. Some states may have no edge coming out of them at all, and some may have 
thousands (e.g., edges labeled a, aa, aaa, aaaa , . . .). 

Transition graphs were invented by John Myhill in 1957 for reasons revealed in the next 
chapter. 

A successful path through a transition graph is a series of edges forming a path begin¬ 
ning at some start state (there may be several) and ending at a final state. If we concatenate 
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in order the string of letters that label each edge in the path, we produce a word that 
cepted by this machine. 

For example, consider the following TG: 


1 A-2 1-i- A 3 




The path from state 1 to state 2 to state 3 back to state 1 then to state 4 corresponds to 
string ( abb)(A)(aa)(h ). This is one way of factoring the word abbaab, which, we now se 
accepted by this machine. Some other words accepted are abba , ahbaaabha , and b. 

When an edge is labeled with the string A, it means that we can take the ride it 
free (without consuming any letters from the input string). Remember that we do not hav 
follow that edge, but we can if we want to. 

If we are presented with a particular string of a’s and b’s to run on a given TG, we 
decide how to break the word into substrings that might correspond to the labels of edg< 
a path. If we run the input string abbab on the machine above, we see that from sta 
where we must start, we can proceed along the outgoing edge labeled abb or the one la 
b. This word then moves along the edge from state 1 to state 2. The input letters abb are 
and consumed. What is left of the input string is ab , and we are now in state 2. From st 
we must move to state 3 along the A-edge. At state 3, we cannot read aa, so we must 
only a and go to state 4. Here, we have a b left in the input string but no edge to folio 
despite our best efforts we still must crash and reject the input string abbab. 

Because we have allowed some edges to be traversed for free, it is logical to alio 
the possibility of more than one start state. The reason we say that these two points a 
lated is that we could always introduce more start states if we wanted to, simply by co 
ing them to the original start state by edges labeled A. This point is illustrated by the fo 
ing example. There is no real difference between the TG 


fKfsl 



and the TG 




jn the sense that all the strings accepted by the first are accepted by the second and vice 
Versa. There are differences between the two machines such as the total number of states 
they have, but as language-acceptors they are equivalent. 

It is extremely important for us to notice that every FA is also a TG. This means that any 
picture that represents an FA can be interpreted as a picture of a TG. Of course, not every 
TG satisfies the definition of an FA. 








OKING AT TGs 

Let us consider some more examples of TGs. 


The picture above represents a TG that accepts nothing, not even the null string A. To be 
able to accept anything, it must have a final state. 

The machine 






’ accepts only the string A. Any other string cannot have a successful path to the final state 
through labels of edges because there are no edges (and hence no labels). 

Any TG in which some start state is also a final state will always accept the string A; 
this is also true of FAs. There are some other TGs that accept the word A. For example, 
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Looking at TGs 

example 

The following TG 


This machine accepts only the words A, baa, and abba. Anything read while in the + 
will cause a crash, because the + state has no outgoing edges. 


EXAMPLE 


The following TGs also only accept A 


accepts the language of all words that begin and end with different letters. This follows as a 
_ logical extension of the reasoning for the previous example. ■ 


EXAMPLE 


EXAMPLE 


Consider the following TG 


The following TG 


We can read all the input letters one at a time and stay in the left-side state. When Wj 
read a b in the - state, there are two possible edges we can follow. If the very last letter m 
b, we can use it to go to the + state. This b must be the very last letter, because once in tjsj 
right-side state, if we try to read another letter, we crash. 

Notice that it is also possible to start with a word that does end with a b, but to follow at 
unsuccessful path that does not lead to acceptance. We could either make the mistake of foj 
lowing the nonloop fr-edge too soon (on a nonfinal b ), in which case we crash on the nel 
letter, or else we might make the mistake of looping back to — when we read the last b,M 
which case we reject without crashing. But still, all words that end in b can be accepted 
some path, and that is all that is required. 

The language accepted by this TG is all words ending in b. One regular expression fo 
this language is (a + b)*b and an FA that accepts the same language is 


accepts the language of all words in which the a 's occur only in even clumps and that end in 
three or more b' s. There is never an edge that reads a single a and it takes bbb at the end to 

get to T. ■ 


EXAMPLE 


Consider the following TG 


L 1° this TG, every edge is labeled with a pair of letters. This means that for the string to be ac- 
}■ cepted, it must have an even number of letters that are read and processed in groups of two’s 
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JjSSifJ’ I J Instead of presenting a definite algorithm right now for determining whether a partic- 
- u j ar string is accepted by a particular TG, we shall wait until Chapter 11 when the task 
C J | | will be easier. There are, of course, difficult algorithms for performing this task that are 

--a ./ i ~ w ithin our abilities to analyze at this moment. One such algorithm is outlined in Problem 
20 on page 91. 

* The existence of A-edges also allows for a new and completely unsettling set of possi- 
bilities—-it allows infinite things to happen in seemingly finite situations. 

Consider the following TG: 

l a a 

A-;: 


Obviously, the only word accepted by this machine is the single word aa, but it can be 
accepted by infinitely many different paths. It is even possible to conceive that this ma¬ 
chine accepts the word aa through paths of infinite length by looping infinitely many 
times before moving to the next state. But by our understanding, “paths” of necessity 
mean only “finite paths.” A-loop-edges can make life difficult, and just as obviously their 
Utility is nil. If we take any TG with A-loops and trim away these loops, the resultant pic¬ 
ture is still a TG and accepts the same set of input strings. Why did we ever allow 
A-loops in the first place? One answer is so that we leave our definition as simple and 
universal-sounding as possible (“any edges, anywhere, with any labels”) and another is 
that A-loops are not the only way of getting an infinite path out of a finite input string. 
Behold the A-circuit: 


It is obvious how to eliminate this particular A-circuit, but with the machine 


if any A option is erased, the resultant language is changed. 

Yet, another reason for not adding extra clauses to the definition of the TG to avoid this 
problem is that A-edges, as we shall see in Chapter 7, are never necessary at all, in the sense 
that any language that can be accepted by a TG with A-edges can be accepted by some dif¬ 
ferent TG without A-edges. 
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GENERALIZED TRANSITION GRAPHS 


The ultimate step liberating state-to-state transitions is to allow the input to progress ft 
one place to another by contributing a substring restricted to being a word in a prede 
mined language. For example, 


We can travel from start to state 2 by reading any (of course finite) word from the (possi| 
infinite) set of choices L x and, similarly, between all other states. 

For the moment, we will not be so arbitrary as to allow just any language to be used! 
labels, not only those languages defined by regular expressions. | 

This gives us a new concept of a transition graph. 


DEFINITION 

A generalized transition graph (GTG) is a collection of three things: 

1. A finite set of states, of which at least one is a start state and some (maybe none) are I 
nal states. 

2. An alphabet X of input letters. 

3. Directed edges connecting some pairs of states, each labeled with a regular expr 


EXAMPLE 


This machine accepts all strings without a double b. Notice that the word b takes a A-e 
from start to middle. 

In a very real sense, there is no difference between the Kleene star closure for reg 
expressions and a loop in our previous transition graphs, or FAs for that matter. Compare 
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Even if we restrict labels to strings of only one letter or A, we may indirectly pei 
these two equivalent situations: 


We have already seen that in a TG a particular string ot input letters may trace inroi 
the machine on different paths, depending on our choice of grouping. For instance, abb j 
go from state 3 to 4 or 5 in the middle of the three preceding examples, depending 
whether we read the letters two and one or all three at once. The ultimate path through 
machine is not determined by the input alone. Therefore, we say this machine is nondel 
ministic. Human choice becomes a factor in selecting the path; the machine does not m 
all its own determinations. 


$ PROBLEMS 


1. For each of the five FAs pictured in Problems 17, 19, and 20 in Chapter 3, ouiia a ir 
tion graph that accepts the same language but has fewer states. 

2. For each of the next 10 words, decide which of the six machines on the next page a< 
the given word. 

(i) A 

(ii) a 

(iii) b 

(iv) aa 

(v) ab 

(vi) aba 

(vii) abba 
(viii) hab 

(ix) baab 

(x) abbb 


Problems 


ov 



3. Show that any language that can be accepted by a TG can be accepted by a TG with an 
even number of states. 

4. How many different TGs are there over the alphabet {a b } that have two states? 

5. Prove that for every TG there is another TG that accepts the same language but has only 
one 4- state. 

6. Build a TG that accepts the language L, of all words that begin and end with the same 
double letter, either of the form aa . . . aa or bb . . . bb. Note: aaa and bbb are not 
words in this language. 

7. If OURSPONSOR is a language that is accepted by a TG called Henry, prove that there 
is a TG that accepts the language of all strings of a's and b’s that end in a word from 
OURSPONSOR. 


8. (i) Suppose that L is a finite language whose words are w’ ( , w 2 , w v . . \ , h> 83 . Prove 
that there is a TG that accepts exactly the language L. 
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tblems 


91 




Let the language L be accepted by the transition graph T and let L not contain the word 
ba. We want to build a new TG that accepts exactly L and the word ba. 

(i) One suggestion is to draw an edge from — to + and label it ba. Show that this does 
not always work. 

(ii) Another suggestion is to draw a new + state and draw an edge from any - state to 
it labeled ba. Show that this does not always work. 

(ii») What does work? 

Let L be any language. Let us define the transpose of L to be the language of exactly 
those words that are the words in L spelled backward. If w eJL, then reverse(ve) eL. For 
example, if 

L = (a abb bbaab bbbaa } 


transpose(L) = { a bha baabb aabbb } 

(i) Prove that if there is an FA that accepts L, then there is a TG that accepts the trans¬ 
pose of L. 

(ii) Prove that if there is a TG that accepts L, then there is a TG that accepts the trans¬ 
pose of L. 

Note: It is true, but much harder to prove, that if an FA accepts L, then some FA ac¬ 
cepts the transpose of L. However, after Chapter 7 this will be trivial to prove. 

(iii) Prove that transpose(L,L 2 ) = transpose(L 2 ) transpose(L 1 ). 

Transition graph T accepts language L. Show that if L has a word of odd length, then T 
has an edge with a label with an odd number of letters. 

A student walks into a classroom and sees on the blackboard a diagram of a TG with 
two states that accepts only the word A. The student reverses the direction of exactly 
one edge, leaving all other edges and all labels and all +’s and — *s the same. But now 
the new TG accepts the language a*. What was the original machine? 

Let us now consider an algorithm for determining whether a specific TG that has no 
A-edges accepts a given word: 

Step 1 Number each edge in the TG in any order with the integers 1, 2, 3, . . ., x, 
where jc is the number of edges in the TG. 

Step 2 Observe that if the word has y letters and is accepted at all by this machine, it 
can be accepted by tracing a path of not more than y edges. 

Step 3 List all strings of y or fewer integers, each of which < x. This is a finite list. 

Step 4 Check each string on the list in step 3 by concatenating the labels of the edges 
involved to see whether they make a path from a - to a + corresponding to the 
given word. 

Step 5 If there is a string in step 4 that works, the word is accepted. If none work, the 
word is not in the language of the machine. 

(i) Prove that this algorithm does the job. 

(ii) Why is it necessary to assume that the TG has no A-edges. 


Hint: Why is the answer not always L*? 

1. (i) Let the language L be accepted by the transition graph T and let L not contain | 
word A. Show how to build a new TG that accepts exactly all the words in L and | 

word A. 

(ii) Given TG, that accepts the language L,, show how to build a TG that accepts 
language L*. (Hint: Use Problems 11 and 12(i) and sound authoritative.) 

3. Using the results of Problems 8, 9, 10, and 12 in an organized fashion, prove that iff 
any language that can be defined by a regular expression, then there is a TG that accep 

exactly the language L*. 

4. Verify that there are indeed three and only three ways for the TG on p. 84 to accept | 
word abbbabbbabba. 

5. An FA with four states was sitting unguarded one night when vandals came st ° le 
edge labeled a. What resulted was a TG that accepted exactly the language b . W 
morning the FA was repaired, but the next night vandals stole an edge labeled b 
what resulted was a TG that accepted a*. The FA was again repaired, but this time 
vandals stole two edges, one labeled a and one labeled b, and the resultant TG acce{ 
the language a* + b*. What was the original FA? 


9. 

10 . 

11 . 


becomes 


(ii) Of all TGs that accept exactly the language L, what is the one with the fewest nu|j 
ber of states? 


Given a TG, called TG,, that accepts the language L, and a TG, called TG 2 , that acc 
the language L„ show how to build a new TG (called TG 3 ) that accepts exactly the Jg| 

guage L, + L 2 . 

Given TG, and TG 2 as described in Problem 9, show how to build TG 4 that accepts 
actly the language L,L r 

Given a TG for some arbitrary language L, what language would it accept if every 
state were to be connected back to every - state by A-edges? For example, by 
method, 
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[ZAPS C ZEPS C ZIPS C ZAPS] = [ZAPS - ZEPS = ZIPS] 


P| PROOF 

three sections of our proof will be: 

t| p ar t 1 Every language that can be defined by a finite automaton can also be defined by 
mmmcfc r a transition graph. 

' Part 2 Every language that can be defined by a transition graph can also be defined by 
a regular expression. 

Part 3 Every language that can be defined by a regular expression can also be defined 
by a finite automaton. 

JggJ When we have proven these three parts, we have finished our theorem. 

Jfer Proof of Part 1 

This is the easiest part. Every finite automaton is itself already a transition graph. Therefore, 
any language that has been defined by a finite automaton has already been defined by a tran¬ 
sition graph. Done. 
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Proof of Part 2 

The proof of this part will be by constructive algorithm. This means that we present a pro¬ 
cedure that starts out with a transition graph and ends up with a regular expression that de¬ 
fines the same language. To be acceptable as a method of proof, any algorithm must satisfy 
two criteria. It must work for every conceivable TG, and it must guarantee to finish its job 
in a finite time (a finite number of steps). For the purposes of theorem-proving alone, it 
does not have to be a good algorithm (quick, least storage used, etc.). It just has to work in 
every case. 

Let us start by considering an abstract transition graph T. T may have many start states. 
We first want to simplify T so that it has only one start state that has no incoming edges. We 
do this by introducing a new state that we label with a minus sign and that we connect to all 
the previous start states by edges labeled with A. Then we drop the minus signs from the 
previous start states. Now all inputs must begin at the new unique start state. From there, 
they can proceed free of charge to any of the old start states. If the word w used to be ac¬ 
cepted by starting at previous start state 3 and proceeding through the machine to a final 
state, it can now be accepted by starting at the new unique start state and progressing to the 
old start state 3 along the edge labeled A. This trip does not use up any of the input letters. 
The word then picks up its old path and becomes accepted. This process is illustrated below 
on a fragment of a TG that has three start states: 1,3, and 5: 


CHAPTER 7 


Kleene’s Theore 


UNIFICATION .J§ 

In the last three chapters, we introduced three separate ways of defining a language. genetyjjj 
lion by regular expression, acceptance by finite automaton, and acceptance by <xans,t,ot| 
graph In this chapter, we will present a theorem proved by Kleene in 1956, which (m out* 
version) says that if a language can be defined by any one of these three ways, then itcaoj 
also be defined by the other two. One way of stating this is to say that all three of these ; 
methods of defining languages are equivalent . 


THEOREM 6 


regular expression, or 

finite automaton, or sjss 

transition graph 

can be defined by all three methods. *g 

This theorem is the most important and fundamental result in the theory of finite 
tomata. We are going to take extreme care with its proof. In the process, we shall mtrodgj 
four algorithms that have the practical value of enabling us actually to construct the cong 
spending machines and expressions. More than that, the importance of this chapter lies in Jj 
value as an illustration of thorough theoretical thinking in this field. 

The logic of this proof is a bit involved. If we were trying to prove the mathematl^ 
theorem that the set of all ZAPS (whatever they are) is the same as the set of all ZEP.. 
could break the proof into two parts. In Part 1, we would show that all ZAPS are also ZE% 
In Part 2, we would show that all ZEPS are also ZAPS. Together, this would demonstrate g 

equivalence of the two sets. . pvadc 

Here, we have a more ambitious theorem. We wish to show that the set of ZAP , 
of ZEPS, and the set of ZIPS are all the same. To do this, we need three P arts - In , V ^ 

shall show that all ZAPS are ZEPS. In Part 2, we shall show that all ZEPS are ZIPS. Fina 
in Part 3, we shall show that all ZIPS are ZAPS. Taken together, these three parts will estap^ 
lish the equivalence of the three sets: Jf 


Any language that can be defined by 








where the labels r, and r., are each regular expressions or simple strings. We can replace this 
with a single edge that is labeled with a regular expression: 


The ellipses in the pictures above indicate other sections of the TG that are irrele 
because they contain no start states. 

Another simplification we can make in T is that it can be modified to have a unique 
exitable final state without changing the language it accepts. If T had no final states to be|^ 
with, then it accepts no strings at all and has no language and we need produce no reguj 
expression other than the null, or empty, expression 4> (see p. 36). If T has several f| 
states, let us un-final them and instead introduce a new unique final state labeled with a pi 
sign. We draw new edges from all the former final states to the new one, dropping the o 
plus signs, and labeling each new edge with the null string A. When an input string runs ! 
of letters and it is in an old final state, it can now take a free A-edge ride to the new umq 
final state. This process is depicted below: m 


where r,, r 2 , and r 3 are all regular expressions or simple strings. In this case, we can replace 
the three loops by one loop labeled with a regular expression: 


The meaning here is that from state x we can read any one string from the input that fits the 
regular expression r, + r 2 + r 3 and return to the same state. 

Similarly, suppose two states are connected by more than one edge going in the same 
direction: 
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This becomes 


becomes 


The new final state has no outgoing edges. 
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We shall require that the unique final state be a different state from the unique start state. If 
an old state used to have ±, then both signs are removed from it to newly created states. 

It should be clear that the addition of these two new states does not affect the language 
if jj| at f accepts. Any word accepted by the old T is also accepted by the new T, and any word 
wsmmm k rejected by the old T is also rejected by the new T. Furthermore, the machine now has the 
; following shape: 


where there are no other — or + states. If the TG was already in this shape, this step could 
have been skipped but, even then, executing it could not have hurt either. 

We are now going to build piece by piece the regular expression that defines the same 
language as T. To do so, we will change T into a GTG. 

Let us suppose that T has some state (called state x) inside it (not the — or + state) that 
has more than one loop circling back to itself: 







We say “replace” bee 
state 2 to state 3 unle 
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1|J We see that in this way we can eliminate the edge from state 1 to state 2, bypassing state 
2 altogether. 

E In fact, every state that leads into state 2 can be made to bypass state 2. If state 9 leads 
r- into state 2, we can eliminate the edge from state 9 to state 2 by adding edges from state 9 to 
p states 3, 4, and 5 directly. We can repeat this process until nothing leads into state 2. When 
f ' this happens, we can eliminate state 2 entirely, because it then cannot be in a path that ac¬ 
cepts a word. We drop the whole state, and the edges leading from it, from the picture for T. 
|§|j> What have we done to transition graph 77 Without changing the set of words that it ac- 
12 cepts, we have eliminated one of its states. 

We can repeat this process again and again until we have eliminated all the states from T 
except for the unique start state and the unique final state. (We shall illustrate this presently.) 
What we come down to is a picture that looks like this: 


fjjpjp- with each edge labeled by a regular expression. We can then combine this once more to pro- 

■ 'JL §1 ff duce 


The resultant regular expression is then the regular expression that defines the same lan- 
jfrSgjgT guage T did originally. 
lUi p 7 For example, if we have 
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we can bypass state 2 by introducing a path from state 1 to state 4 labeled aba*ba, a pilj 
from state 1 to state 5 labeled aba*b, a path from state 3 to state 4 labeled bbba*ba, and® 
path from state 3 to state 5 labeled bbba*b. We can then erase the edges from state 1 to stl 
2 and from state 3 to state 2. Without these edges, state 2 becomes unreachable. The ed« 
from state 2 to states 4 and 5 are then unless because they cannot be part of any pj|| 
from — to +. Dropping this state and these edges will not affect whether any word is ^Si 
cepted by this TG. 

The machine that results from this operation is 




If there had previously been any edges from state 1 to state 5, we leave these alone. 

If we wish to eliminate a given state, say, state 2, we must first list all the edges goi||j 
into that state from other states (say, from states 7 and 9) and also make a list of all the stat|| 
that could be reached from state 2 by an edge (say, states 11,4, and 5). If state 2 were tO;c|| B 
appear, it would interrupt all the paths input strings could have taken that pass through it^^l 
their way to +. We do not wish to destroy any possible paths input strings might take be*j|| 
cause that could change the language by killing some input string’s only path to acceptanc|M HN 
which would eliminate it from the language of the machine. It is too hard for us to ch^m j|j|| 
whether all the accepted input strings have some alternate paths to acceptance that do notpil 
through state 2, so we make a careful point of replacing all destroyed routes with equivaleM J||p 

detours. . fjj|p 

It is our requirement to be sure that whatever change we make in the machine, all-jjnfea gp?* 
strings that could have previously been accepted can still be accepted by the modified mar**** ^ ^ 

chine. In order to safely eliminate state 2 without disturbing any routes from - to +,Jjjj| |®§ 
must install bypass roads going from each incoming state to every outgoing state and be sure|g 
that the labels of the bypass road correspond to the trips obliterated. Jg ||p^ 

In this hypothetical example, we must replace routes from state 7 to states 11,4, an« Hp 
and from state 9 to states 11, 4, and 5. When we draw these new edges, we must label thetn^ gg 1 
with the appropriate tolls that are the charges of going into state 2, around state 2, and 
state 2. If the machine segment we are analyzing started by looking like: 
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Before we claim to have finished describing this algorithm, there are some special cases 
that we must examine more carefully. In the picture 



we might want to eliminate state 2. This is an illustration of the possibility that one of the 
source states to the prospective bypassed state is also a destination state from that state. 

This case is really not different from the general situation described above. We still need 
to replace all the paths through the machine that previously went through state 2. The incom¬ 
ing states are 1 and 3 and the outgoing state is only 1. Therefore, we must add edges con¬ 
necting state 3 to state 1 and state 1 to state 1. The edge we shall add to connect state 1 to it¬ 
self is a loop that summarizes and replaces the trip from 1 to 2 to 1. The machine then 
becomes 


r l r 2* r 3 



r 4 r 2 *r 3 


■0 


Originally, it was possible to take a path from state 3 to state 2 to state 1 to state 2 and 
back to state 1 again at the cost of r 4 r 2 *r 3 r,r 2 *r 3 . This path is still represented in the reduced 
machine. It is reflected in the 3-1 edge r 4 r 2 *r 3 followed by the loop at state 1, r,r 2 *r 3 . There¬ 
fore, no real problem arises even when the sets of incoming states and the set of outgoing 
states have some overlap. 
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As it stands, this machine has only one start state with no incoming edges, but it has two fi¬ 
nal states, so we must introduce a new unique final state following the method prescribed by 
the algorithm: 


The next modification we perform is to note that the edge from the start state to state 1 
is a double edge—we can travel over it by an aa or a bb. We replace this by the regular ex¬ 
pression aa + bb. We also note that there is a double loop at state 1 . We can loop back to 
state 1 on a single a or on a single b. The algorithm says we are supposed to replace this 
| double loop by a single loop labeled with the regular expression a + b. The picture of the 
machine has now become 


The algorithm does not actually tell us which state of the TG we must bypass next. The 
order of elimination is left up to our own discretion. The algorithm (when we-formally state 
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The algorithm says we can replace this with one edge from state 1 to state + that bears 
label that is the concatenation of the regular expressions on the two parts of the path. In 
case, aa is concatenated with A, which is only aa again. Once we have eliminated the el 
from state 1, we can eliminate state 2 entirely. The machine now looks like this: 


It seems reasonable now for us to choose to eliminate state 3 next. But the algorithm d 
not require us to be reasonable, and because this is an illustrative example and we have 
ready seen something like this path, we shall choose a different section of T to modify. 

The technique described above does not require us to choose the order of eliminafT 
states in a logical, efficient, intelligent, or aesthetic manner. All these considerations! 
completely inappropriate to the consideration of what is an algorithm. An algorithm mu4 
so clearly stated that it works successfully no matter how little forethought, experience, 4 
emess, or artistic sensibility the applier of the procedure possesses. The algorithm musf 
able to be completely and successfully executed by a dimwit, a half-wit, or even a no 
such as a computer. To execute an algorithm, all we are allowed to presume on the par 
the executing agent is tireless diligence and immaculate precision. 

If we could presume that gifted insight on the part of the executor was routinely 
able, the algorithm would be much simpler: 


Step 1 Look at the machine, figure out its language, and write down an equivalent 
ular expression. 

Unfortunately, people are not as reliably creative as they are reliable drones, and the wl§§ 
purpose of an algorithm is so that we can get some jobs done on a daily basis without W 
ing for Da Vinci to be in the suitable mood. All the requisite cleverness must be incorpor 
into the algorithm itself by the creator of the algorithm. 

If we want the algorithm to be efficient, we must design one that will force the drori 
turn out efficient products. If we want the output to be aesthetic, we must build that in 
Computer science courses that are concerned with how good an algorithm is are funda 
tally different from this course. We are primarily concerned with whether an algorithm to 
complish a certain task exists or not—we are never in search of the “best” one by any s 
dards of what it means to be best. That said, we shall, however, occasionally present 
than one algorithm for accomplishing a certain task, but the reason for this will alwa 
that each of the algorithms we develop can be generalized to other tasks in different 


it) implies that it really does not matter. As long as we continue to eliminate states, we sj 
be simplifying the machine down to a single regular expression representation. 

Let us choose state 2 for elimination. The only path we are now concerned with is J 


As such, they are each the seed of different classes of procedures and each deserves individ- 
^ - ual attention. 

Let us continue with the example of the TG we are in the process of reducing to a regu¬ 
lar expression. Let us stubbornly insist on bypassing state 1 before eliminating state 3. 

Only one edge comes into state 1 and that is from state -. There is a loop at state 1 with 
the label (a + b). State 1 has edges coming out of it that lead to state 3 and state +. 

The algorithm explains that we can eliminate state 1 and replace these edges with an 
ge from state — to state 3 labeled (aa + bb)(a + b)*(bb) and an edge from state - to 
state + labeled (aa + bb)(a + b)*(aa). 

After we eliminate state 1, the machine looks like this: 


(aa + bb)(a + b)*aa 


(aa + bb) (a + b) ’'■bo 


It is obvious that we must now eliminate state 3, because that is the only bypassable 
■Hp3 state left. When we concatenate the regular expression from state - to state 3 with the regu- 
^BSfcfrjar expression from state 3 to state +, we are left with the machine 

MBS' " (aa + bb) (a + ^*88 


IBP 1 ^ (aa + bb) (a + b) *bb 

gg Now by the last rule of the algorithm, this machine defines the same language as the 
regular expression 

(aa + bb)(a + b)*(aa) + (aa + bb)(a + b)*(bb) 


f§j§ j| is entirely conceivable that if we eliminated the states in a different order, we could end up 
w ith a different-looking regular expression. But by the logic of the elimination process, these 
Hi expressions would all have to represent the same language. 

~ we had to make U P a regular expression for the language of all strings that begin and 

§ip | end with double letters, we would probably have written 

Wgk, (aa + bb)(a + b)*(aa + bb) 

which is equivalent to the regular expression that the algorithm produced because the alge¬ 
braic distributive law applies to regular expressions. ■ 

Without going through lengthy descriptions, let us watch the algorithm work on one 
>re example. Let us start with the TG that accepts strings with an even number of a’s and 
even number of b' s, the language EVEN-EVEN. (We keep harping on these strings not 
because they are so terribly important, but because it is the hardest example we thoroughly 
understand to date, and rather than introduce new hard examples, we keep it as an old con- 
f quest.) 







p.' ; j urn i n g TGs into Regular Expressions 105 

IP liS^ it is not a complete algorithm if it breaks down in any case no matter how remote or 
Ifer freakish an occurrence. How can we tell when we have covered all possibilities? Who 
Kg? knows? There is no algorithm to tell whether the algorithm we have proposed has omitted an 
important case—but here is a surprise—this very statement about the limitations of analyz- 
■Rpi^g..algorithms by other algorithms will be proven later on in this book, 
fjjjh Let us consider a complicated, most general-looking case and see whether our simple 
jjjj mies work on it without the introduction of any new difficulties. Consider the TG fragment 
- below: 


^Our state targeted for bypass is state 2. Proceeding in an orderly fashion, we list all the states 
kconnected to state 2 by incoming and outgoing edges. The incoming edges are from states 1 
and 3, the outgoing are to states 3, 4, and 5. Because each previously possible path must still 
Sexist, we need to introduce six new edges (including the loop at 3): 


From To Labeled 


Because there is already a loop at state 3, we can add this regular expression to the existing 
one and the resultant picture is this: 
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State 2 has disappeared but all paths that used to travel through it remain possible 
equally important, no new paths are possible in this new TG that were not possible f 
same cost of input letters in the original TG. 

For example, the old trip through states 1-2-4-4-1-2-3-3-2-5 can still be made. It n 
however, travels through the state sequence 1-4-4-1-3-3-5 whose concatenation of re 
expressions is exactly the same as before. 


fcl 


ALGORITHM 


Now that we already have a fairly good idea of what the state-elimination algorithm i|| 
about, we are ready to present a semiformal statement of the general rules defining the 
structive algorithm that proves that all TGs can be turned into regular expressions that de 
the exact same language: f| 

Step 1 Create a unique, unenterable minus state and a unique, unleaveable plus sf* 3 

Step 2 One by one, in any order, bypass and eliminate all the non — or + states in 
TG. A state is bypassed by connecting each incoming edge with each outgo! 
edge. The label of each resultant edge is the concatenation of the label ortf 
incoming edge with the label on the loop edge if there is one and the labelf 
the outgoing edge. 

Step 3 When two states are joined by more than one edge going in the same direct^ 
unify them by adding their labels. 

Step 4 Finally, when all that is left is one edge from - to +, the label on that edgef 
regular expression that generates the same language as was recognized by 
original machine. 


i 

fc- 

l 

I 


m 


We have waffled about calling this representation a “semiformal” description of the g" 
cedure. The addition of phrases (or symbols) that say things like “for all states q K that 
state q y by a single directed edge ( q x , q y ) labeled r(x, y), and for all states q T such that (A 
is a single directed edge labeled r(y, z), create the directed edge (q x , q z ) and label it [r(|§ 
r(y, y)*r(y, z)], where r(y, y) is the regular expression labeling the possible loop at sta(j 
while deleting the state q and all its associated edges,” and so on, would please some pe 
more, but would not help anyone go from a state of not understanding the algorithm j§ 
state of understanding it. J 

There is one logical possibility that we have not accounted for in the description oil 
algorithm given above; that is, when we finish step 3, there may be no path left at all t 
connects — to 4-. In this case, we say that the original machine accepted no words, v§§ 
means that it accepted only the null language <}> whose regular expression has no sym 
We shall consider the logical consequences of this possibility in a later chapter; at the 
ment, all it means is that completing the algorithm guarantees producing a regular exp 
sion for all machines that accept a language and no expression for those that do not. 


P 


p# 


K- 
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EXAMPLE 


Consider the TG 



Eliminating the states in the order 3, 2, 1 gives this procession of TGs: 
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CHAPTER 7 Kleene’s Theorem 


^cepts only the word*. 

PI 0 ne FA that accepts only A is 


|t would be easier to design these machines as TGs, but it is important to keep them as 


If there is an FA called FA { that accepts the language defined by the regular ex¬ 
pression r, and there is an FA called FA 2 that accepts the language defined by 
the regular expression r 2 , then there is an FA that we shall call FA 3 that accepts 
the language defined by the regular expression (r^ + r 0 ). lh > 


^ - Proof of Rule 2 

jjjl f>f. We are going to prove Rule 2 by showing how to construct the new machine in the most rea- 
f sonable way from the two old machines. We shall prove FA 3 exists by showing how to con- 
* " struct it. 

Igfe Before we state the general principles, let us demonstrate them in a specific example. 
Suppose we have the machine FA { pictured below, which accepts the language of all words 
over the alphabet X = {a b) that have a double a somewhere in them 


If we had not seen how they were derived, we might have no clue as to whether these 
regular expressions define the same language. 


$ CONVERTING REGULAR EXPRESSIONS INTO FAs 


Proof of Part 3 

The proof of this part will be by recursive definition and constructive algorithm at the ss 
time. This is the hardest part of our whole theorem, so we shall go very slowly. 

We know that every regular expression can be built up from the letters of the alphato 
and A by repeated application of certain rules: addition, concatenation, and closure. We s 
see that as we are building up a regular expression, we could at the same time be buildinj 
an FA that accepts the same language. 

We present our algorithm recursively. 

/Rule I There is an FA that accepts any particular letter of the alphabet. There is an 
that accepts only the word A. 


Ill and the familiar machine FA,, which accepts all words that have both an even number of to 
1 tal a 's and an even number of total b 's (EVEN-EVEN) 


Proof of Rule 1 

If * is in X, then the FA 


We shall show how to design a machine that accepts both sets. That is, we shall build a ma¬ 
chine that accepts all words that either have an aa or are in EVEN-EVEN and rejects all 
strings with neither characteristic. 

The language the new machine accepts will be the union of these two languages. We 
shall call the states in this new machine z,, z 2 , z 3 , and so on, for as many as we need. We 
shall define this machine by its transition table. 



a 

b 

±>1 
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?4 
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sip 




Our guiding principle is this: The new machine will simultaneously keep track of 
the input would be if it were running on FA, alone and where the input would be if iff*® 
running on FA 2 alone. 

First of all, we need a start state. This state must combine x,, the start state for FA 
y,, the start state for FA V We call it z,. If the string were running on FA,, it would start 
and if on FA 2 iny,. 

All z-states in the FA 3 machine carry with them a double meaning—they keep track! 
which x state the string would be in and which y state the string would be in. It is not! 
we are uncertain about which machine the input string is running on—it is running on 
F4, and FA 2 , and we are keeping track of both games simultaneously. 

What new states can occur if the input letter a is read? If the string were being ru 
the first machine, it would put the machine into state x 2 . If the string were running 
second machine, it would put the machine into state y y Therefore, on our new machine 
puts us into state z 2 , which means either x 2 or y 3 , in the same way that z, means either 
y r Because y, is a final state for FA V z, is also a final state in the sense that any word \y 
path ends there on the z-machine would be accepted by FA r 


M 

4 

it- . 

T 


m 1 

§gj: if; ' 


On the machine FA 3 , we are following both the path the input would make on FA, and t 
put’s path on FA 2 at the same time. By keeping track of both paths, we know when the | 
string ends, whether or not it has reached a final state on either machine. 

Let us not consider this “x or y” disjunction as a matter of uncertainty. We know 
fact that the same input is running on both machines; we might equivalently say “x ani 
We may not know whether a certain person weighed 100 or 200 lb to start with, but wff 
certain that after gaining 20 lb, then losing 5, and then gaining 1, his total weight is nov| 
actly either 116 or 216 lb. So, even if we do not know in which initial state the string staij 
we can still be certain that given a known sequence of transformations, it is now definite! 
either one of two possible conditions. 

If we are in state z, and we read the letter b, then being in x, on FA, and reading a b 
return to x,, whereas being in y, on FA 2 and reading a b send us to y 2 . 


P 




1 i 


i = X. or y 2 


The beginning of our transition table for FA 3 is 


m 

Sit 

lb 

Sp 

£ 
jjj! 
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Suppose that somehow we have gotten into state z 2 and then we read an a. If we were in 
we would now go to state x 3 , which is a final state. If we were in FA 2 , we would now go 
to y,, which is also a final state. We will call this condition z 4 , meaning either x 3 or y, 
cause this string could now be accepted on one of these two machines, z 4 is a final stafe 
FA y As it turns out, in this example the word is accepted by both machines at once, but 
is not necessary. Acceptance by either machine FA, or FA 2 is enough for acceptance by 
Membership in either language is enough to guarantee membership in the union. 

If we are in state z 2 and we happen to read a /?, then in FA, we are back to x,, whe 
FA 2 we are in y 4 . Call this new condition z 5 = state x, or y 4 . 


W: 

i 




+z 4 = X 3 or y, 
Z 5 = x, or y 4 




1 
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At this point, our transition table looks like this: 



a b 

±z. 


1 

*2 ^3 

Z 2 

Z 4 Z 5 




What happens if we start from state z 3 and read an a? If we were in FA,, we are now in 
f in FA V we are now in y 4 . This is a new state in the sense that we have not encountered 
tfijscombination of x and y before; call it state z 6 . 

z 6 = x 2 or y 4 

ggr What if we are in z 3 and we read a bl In FA, we stay in x,, whereas in FA 2 we return to 
l^lThis means that if we are in z 3 and we read a b, we return to state z,. This is the first time 
|§it, we have not had to create a new state. If we never got any use out of the old states, the 
HP machine would grow ad infinitum. 
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x 3 or y 3 
x 3 or y 2 



Q ur transition table now looks like this: 

I-*? ' __ a h 

±Z 1 Z 2 Z 3 

■g Z 2 Z 4 Z 5 

Z 3 Z 6 Z l 

r ~ - - What if we are in z 4 and we read an al If we are tracing FA,, the input remains in x 3 , 
JpJ^hereas if we are tracing the input on FA V it goes to y 3 . This is a new state; call it z 7 . If we 
| are inz 4 and we read a b , the FA, part stays at x 3 , whereas the FA 2 part goes to y r This is also 
a new state; call it z g . 

+z, = .r 3 or >3 

WE +h = x i or .v 2 

Both of these are final states because a string ending here on the z-machine will be accepted 
3by FA,, because x 3 is a final state for FA,. 

If we are in z 5 and we read an a, we go to x 2 or y 2 , which we shall call z g . 

If we are in z 5 and we read a b, we go to x, or y 3 , which we shall call z, 0 . 

z 9 = ^ 2 or 

z io = ^i or 

If we are in z 6 and we read an a, we go to x 3 or y 2 , which is our old z g . 

If we are in z 6 and we read a b, we go to x, or y 3 , which is z, 0 again. 

If we are in z 7 and we read an a, we go to x 3 or y,, which is z 4 again. 

If we are in z 7 and we read a b, we go to x 3 or y 4 , which is a new state, z,,. 

+z„=x 3 or y 4 

p If we are in z 8 and we read an a , we go to x 3 or y 4 = z,,. 

If we are in z 8 and we read a b, we go to x 3 or y, = z 4 . 

If we are in z g and we read an a, we go to x 3 or y 4 = z,,. 

If we are in z y and we read a b, we go to x, or y, — z v 





If we are in z, 0 and we read an a, we go to x 2 or y,, which is our last new state, z 12 . 

+z, 2 = x 2 or y, 

If we are in z, 0 and we read a b, we go to x, or y 4 = z y 





If a string traces through this machine and ends up at a final state, it means that it woul 
end at a final state either on machine FA, or on machine FA 2 . Also, any string accepted 
ther FA, or FA 2 will be accepted by this FA y 
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ft ALGORITHM 

H The general description of the algorithm we employed earlier is as follows. Starting with two 
ft jnachines, FA, with states x,, x 2 , x 3 , ... and FA 2 with states y,, y 2 , y y ... , build a new 
f machine FA, with states z p z 2 , z 3 , .... where each z is of the fom “^ elhing or y somMhin( , 
§C combination state x start or y start is the — state of the new FA. If either the x part or the y 
||lpart is a final state, then the corresponding z is a final state. To go from one z to another by 
W reading a letter from the input string, we see what happens to the x part and the y part and go 
E to the new z accordingly. We could write this as a formula: 

z new after letter p = [x new after letter /? on FA,] or [y new after letter p on FA 2 j 

P- Because there are only finitely many x’s and y’s, there can be only finitely many possi- 
I .foie z’s. Not all of them will necessarily be used in FA 3 if no input string beginning at - can 
id get to them. In this way, we can build a machine that can accept the sum of two regular ex¬ 
pressions if we already know machines to accept each of the component regular expressions 
separately. ■ 


EXAMPLE (Inside the proof of Theorem 6) 


iet us go through this very quickly once more on the two machines 


FA, accepts all words with a double a in them, and FA 2 accepts all words ending in b .The 
machine that accepts the union of the two languages for these two machines begins: 


In z, if we read an a, we go to x 2 or y. 
In z, if we read a b , we go to x, or y 2 = 

The partial picture of this machine is now 


which is a final state since 


















z., which is a final state because x , is, 
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let M 2 be the machine below that accepts all words with an odd number of letters (odd 


Using the algorithm produces the machine below that accepts all words that either have 


odd number of letters or end in a 


The only state that is not a + state is the - state. To get back to the start state, a word must 
IjlaFfe an even number of letters and end in b. ■ 


MPLE (Inside the proof of Theorem 6) 


, Which accepts all words ending in a , and let FA 2 be 


.which accepts all words ending in b. 
Using the algorithm, we produce 
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Rifs a perfectly possible FA for the union language FA i + FA r However, on inspection we 
' that its lower right-hand state is completely useless because it can never be entered by 
string starting at —.It is not against the definition of an FA to have such a useless state, 
is it a crime. It is simply an example of the tradeoff between constructing states in our 
need-to-have policy versus the more universal-seeming all-at-once strategy. 
jp gy e ither algorithm, this concludes the proof of Rule 2. 

1 We still have two rules to go. 

FRule 3 If there is an FA { that accepts the language defined by the regular expression r, 
and an FA 2 that accepts the language defined by the regular expression r 2 , then 
there is an FA 3 that accepts the language defined by the ^concatenation r^, the 
product language. 


-proof of Rule 3 

jjlk Again, we shall verify this rule by a constructive algorithm. We shall prove that such an FA 3 
pS? ixisls by showing how to construct it from FA { and FA r As usual, first we do an illustration; 
§fg then we state the general principles, but our illustration here first is of what can go wrong, 
jtf-flot what to do right. 

lip . Let L, be the language of all words with b as the second letter. One machine that accepts 


Let L 2 be the language of all words that have an odd number of a’s. One machine for L 


Now consider the input string ababbaa. This is a word in the product language L,L 2 , be¬ 
cause it is the concatenation of a word in L, ( ab ) with a word in L 2 ( abbaa ). If we begin to 
ph this string on FA [y we would reach the + state after the second letter. If we could now 
somehow automatically jump over into FA 2 , we could begin running what is left of the input, 
Qbbaa, starting in the — state. This remaining input is a word in L so it will finish its path in 
jjt + state of FA r Basically, this is what we want to build—an FA 3 that processes the first 
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part of the input string as if it were FA,; then when it reaches the FA, + state, it turns 
the - state on FA r From there it continues processing the string until it reaches the t 
on FA V and we can then accept the input. 

Tentatively, let us say FA 3 looks something like this: 


Unfortunately, this idea, though simple, does not work. We can see this by consider! 
different input string from the same product language. The word ababbab is also in 
cause abab is in 1, (it has b as its second letter) and bab is in L 2 (it has an odd number o| 

If we run the input string ababbab first on FA v we get to the + state after two 1 
but we must not say that we are finished yet with the L, part of the input. If we stopped 
ning on FA, after ab, when we reached + in FA V the remaining input string abbab couf 
reach + on FA 2 because it has an even number of a’s. 

Remember that F4, accepts all words with paths that end at a final state. They 
pass through that final state many times before ending there. This is the case with the 
abab. It reaches + after two letters. However, we must continue to run the string on FA 
two more letters. We enter + three times. Then we can jump to FA 2 (whatever that ri| 
and run the remaining string bab on FA r The input bab will then start on FA 2 in the 
and finish in the + state. H 

Our problem is this: How do we know when to jump from FA, to FA 2 ? With the 
ababbaa we should jump when we first reach the + in FA V With the input ababbab 
differs only in the last letter), we have to stay in F4, until we have looped back to the H* 
some number of times before jumping to FA r How can a finite automaton, which must 
a mandatory transition on each input letter without looking ahead to see what the rest 
string will be, know when to jump from FA, to FA 2 ? 

This is a subtle point, and it involves some new ideas. 

We have to build a machine that has the characteristic of starting out like FA, a 
lowing along it until it enters a final state at which time an option is reached. Either 
continue along FA, waiting to reach another + , or else we switch over to the start stall 
FA 2 and begin circulating there. This is tricky, because the r, part of the input stria 
generate an arbitrarily long word if it has a star in it, and we cannot be quite sure of 
to jump out of FA, and into FA r And what happens (heavens forfend) if FA, has mor^ 
one + ? 

Now let us illustrate how to build such an FA 3 for a specific example. The two mac 
we shall use are M 


the machine that accepts only strings with a double a in them 


the machine that accepts all words that end in the letter b 


119 
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We shall start with the state z,, which is exactly like x { . It is a start state, and it means that 
l input string is being run on FA, alone. Unlike the union machine the string is not being 
i on M 2 yet. From z, if we read a b , we must return to the same state x v which is z, again. 

2 X if we read an a, we must go to state x 2 because we are interested in seeing that the 
it section of the input string is a word accepted by FA,. Therefore, z 2 is the same as jt 2 . 
>m the state z 2 if we read a b, we must go back to z,. Therefore, we have the relationships 


The picture of FA 3 starts out just like the picture of FA 


if we are in z 2 and we read an a, we must go to a new state z v which in some ways cor- 
responds to the state Jt 3 in FA,. However, x 3 has a dual identity. Either it means that we have 
|g| reached a final state for the first half of the input as a word in the language for FA, and it is 
p where we cross over and run the rest of the input string on FA 2 , or else it is merely another 
j||| state that the string must pass through to get eventually to its last state in FA,. Many strings, 
12 some of which are accepted and some of which are rejected, pass through several + states on 
1 their way through any given machine. 

&F I If we are now in z 3 in its capacity as the final state of FA, for the first part of this input 
string, we must begin running the rest of the input string as if it were input of FA 2 beginning 
EJ at state y,. Therefore, the full meaning of being in z 3 is 

WtrT ( x v and we are still running on FA, 


l y,, and we have begun to run on FA 2 

* { . ■ gj v* Notice the similarity between this disjunctive (either/or) definition of z 3 and the disjunc- 
BhB* five definitions for the z-states produced by the algorithm given for the addition of two FAs. 
There are also significant differences, as discussed next. 

If we are in state z 3 and we read an a, we have now three possible interpretations for the 
state into which this puts us: 

/ We are back in x 3 continuing to run the string on FA, 


we have just finished on FA, and we are now in y 
beginning to run on FA 2 


we have looped from y, back to y, while already running on 
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-- - Thus, we have produced a machine that accepts exactly those strings that have a front 
ection with a double a followed by a back section that ends in b. This we can see because 
JBput a double a we never get to z 3 and we end in z 4 only if the whole word ends in b. 


ALGORITHM 

general, we can describe the algorithm for forming the machine FA 3 as follows. First, we 
«4|jake a 2 -state for every nonfinal v-state in FA { reached before ever hitting a final state on 
tFA v For each final state in FA V we establish a z-state that expresses the options that we are 
continuing on FA X or are beginning on FA r 

flg f Are in * someth)ng , which is a + state but still 

■■JSP fe continuing on FA { 

W i or 

have finished the FA X part of the input string and 
have jumped to y, to commence tracing the remainder of 
\ the input string on FA 2 

After we have reached a jump-to-F4 2 state, any other state we reach has an x and a y possibil- 
g ity like the z-states in the union machine, with the additional possibility that every time we hit 
If?:; Jet another final state on the FA X -machine, we may again exercise the option of jumping to y,. 
1 This means that every time we pass through a final state while processing the FA X part of the 

_ input string, we jettison an alter-ego jumping to y x that runs around on the M 2 -machine. 

These little mice tracing paths on FA 2 each start at y, but at different points in the input string, 
any future instant they may be at several different y-states on FA V Every z-state therefore 
have the nature of one and only one x-state, but a whole set of possible y-states. 

So, the full nature of a z-state is 





, continuing on FA. 


are in a set of y 


r continuing on FA 2 



asm 


1SH|| There are clearly only finitely many possibilities for such z-states, so FA 3 is a finite ma- 
The transition from one z-state to another for each letter of the alphabet is determined 
r;J?_piquely by the transition rules in FA X and FA T One set of y’s will move to another set of y’s 
fg their a-edges or fr-edges. So, FA^ is a well-defined finite automaton that clearly does 
we want; that is, it accepts only strings that first reach a final state on FA V jump to y,, 
atid then reach a final state on FA 2 . 

still have to decide which states in the new FA are final states. Clearly, to be in 
m eans to end in a final state in FA V so any z-state is a final state if-it contains a 
' ^machine final state as a possible position for the input. This completes the algorithm. ■ 
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EXAMPLE (Inside the proof of Theorem 6) 

Let us illustrate this algorithm to construct the machine for the product of the languj 
L v all words that start with a b, and L 7 , all words that end with a b. 
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Initially, we must begin in x ( , which we shall just call z v If we read an a, we go 
which we may as well call z 2 . If we read a b, we go to x 3 , which being a final state i 
that we have the option of jumping to y,, an option we do not necessarily have to exe 
this moment. 

z 3 = * 3 or y\ 

From z 2 , like x 2 , both an a or a b take us back to z r 

In z 3 if we are in the x 3 condition and we read an a, we stay in x 3 or we now choo| 
cause x 3 is a final state) to jump to y v If we were in z 3 in the y x condition already 
read an a , we would loop back to y, on the /v4 2 -machine. In any of these three eventu 
if we are in z 3 and we read an a, we end up at either x 3 or y p in other words, from z 3 
back to z 3 . 

If we are in z 3 and we read a b, a different event takes place. If the z 3 meant x 3 , we 
stay there or use the occasion to jump to y,. If we were in z 3 already in y v then the b ' 
necessarily take us toy 2 . Therefore, we need a new state: 

z 4 = x 3 or y ] or y 2 

If the input string processing ends in this state, then it should be accepted because i 
have gotten to the final state on the /^-machine. So, z 4 is a final state for FA y 

What happens if we are in z 4 and we read an al 

x 3 goes to x v staying on FA ] 


jjjjp 
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x 3 goes to x 3 , then jumps to y, on FA 2 


y, stays in y x 


y 2 goestoy 1 

So from z 4 ana takes us to x 3 or y v which is z y 

What happens if we are in z 4 and we read the input letter bl 

x 3 goes to x 3 , staying on FA X 


x, goes to x v which jumps to y. 
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P - y 2 loops back to y 2 

rom z 4 a b will loop us back to z 4 . 

The complete picture of the machine then looks like this 


Tis fairly evident that this is a decent machine for all words that both begin and end with 
fie letter b, which is what the product of the two languages would be. Notice that the 
feord h itself is not accepted. Even though it begins and ends with the letter b , they are 
Me same letter b and therefore it cannot be factored into b-beginning and fr-ending 

iritlgs. 

What if we were to multiply these languages in the opposite order: all words that end in 
j times all words that begin with a b. The resultant language should be that of all words with 
idouble b in them. To build the machine, we multiply FA 2 times FA { : 


Ip z 2 = y 2 or x, (because y 2 is a final state for the first machine) 

•am an a will take us to y. or x 2 , not x., because y, is not a final state on the first machine 


om z 2 a b will take us from y 2 back to y. 

tis is a new state: 


So, z 3 has an a-loop. 
i x., or Xt to x^. This is a new state 


or x, to x~, or x, to x v This is also a new state: 


S-Om z 4 ab will take us from y 2 to y 2 , or y 2 to y 2 to x p or x, to x 3 , or x 3 to x 3 , which is z 4 . 

z 5 an a will take us from y 7 to y p x x to x 2 , x 2 to x 2 , which is just z 3 again, 
rom z,ab will take us from y n to y 0 , y 0 to y 7 to x., x, to x„ x 7 to x,, which is a new state 


From z 6 an a will loop us back to z 6 for each of its three components 
From z 6 a b will take us from y, to y 2 , y, to y 2 to x v x 2 to x 2 , x 3 to x 3 = 
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From z 7 an a will take us from y, to y,, x, to x 2 , x 2 to x v x 3 to x 3 - 6 . 

From z 7 a b will take us from y 2 to y v y 2 to y 2 to x v x x to x v x 2 to x 2 , x 3 to x 3 


Therefore, the machine is finished. 


which accepts the language L 2 of all words with an odd number of letters. 

Using the preceding algorithm, we produce the following machine to accept the 
language F,L 2 : 


The only final states are those that contain the possibility of x 3 . It is very clear that th| 
chine accepts all words with a double b in them, but it is obviously not the most ei 
machine to do so. 

While we were working the last example, we may have begun to loose faith in the 
ness of the algorithm; new (and needless) states kept arising. Yet, every state of the 
chine had the identity of a single y-state and a subset of estates. There are finitely matt 
sibilities for each of these and therefore finitely many possibilities for them joint! 
algorithm must always work and must always terminate. 


EXAMPLE (Inside the proof of Theorem 6) 


Let FA, be 


which accepts the language L, of all words that do not contain the substring aa. 
Let FA 2 be 
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All states except the - state are final states. The - state is left the instant an input letter 
S read, and it can never be reentered. Therefore, the language this machine accepts is all 
yords but A. This actually is the product language L,F 2 , because if a word w has an odd 
Kimber of letters, we can factor it as (A)(>v), where A is in L, and w is in L r While if it has 
pn even (not 0) number of letters, we factor it as 

jjj||| * w ~ (fi rst letter)(the rest) 

vhere (first letter) must be in L x (cannot contain aa) and (the rest) is in L 2 . Only the word A 
Ittnot be factored into a part in L, and a part in L r ■ 


We are now ready for our last rule. 

$ule4 / If r is a regular expression and FA { is a finite automaton that accepts exactly 
the language defined by r, then there is an FA called FA 2 that will accept ex¬ 
actly the language defined by r*. 


£ Proof of Rule 4 

-, The language defined by r* must always contain the null word. To accept the null string A, 
; : we must indicate that the start state is also a final state. This could be an important change in 
the machine FA,, because strings that return to x x might not have been accepted before. They 
Lr may not be in the language of the expression r. The building of our new machine must be 
4 done carefully. 

fiv We shall, as in the other cases, first illustrate the algorithm for manufacturing this ma- 
chine with a simple example. We cannot use most of the examples we have seen recently 
? | because their closure is not different from themselves (except for the possibility of includ- 
gr rag the word A). This is just a curious accident of these examples and not usual for regular 
p expressions. The concatenation of several strings of words ending in b is itself a word end- 
|g tng in b. The concatenation of several strings containing aa is itself a string containing aa. 
L The concatenation of arbitrarily many EVEN-EVEN strings is itself an EVEN-EVEN 
if string. 

IT Let us consider the regular expression 

E£|tf r = a* + aa*b 

Jj The language defined by r is all strings of only a’s and the strings of some (not 0) a’ s ending 
in a single b. The closure of this language is defined by (a* + aa*b)*, which includes all 
%- words in which each b has an a on its left. Here, r* is clearly not equal to r, because such 
mg Words as aba and ababaaa are in r* but not in the language of r. The language of r* is all 
a Strings without a double b that do not begin with b. 
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ALGORITHM (incomplete) 

gphe general rule for this algorithm is that each z-state corresponds to some collection of 
Estates. We must remember each time we reach a final state it is possible that we have to 
stm over again at x v There are only finitely many possible collections of A-states, so the ma¬ 
chine produced by this algorithm has only finitely many states. The transitions from one col¬ 
lection of A-states to another based on reading certain input letters is determined completely 
■BP*® 1 transition rules for FA V ■ 


This is not actually a bad machine for the language defined by 


(a* + aa*b)* 


^fcS»verting Regular Expressions into FAs 

we ever get to z 2 , the total input is to be rejected, so we stay at z 2 . We know this me- 
fc^ically (which means here that we know it without any intelligent insight, which is im- 
^SSant because we should never need anything that the algorithm does not automatically 
Si3vfide) because x 4 loops back to x 4 by a and b and therefore z 2 must do the same. 

we are in z 3 and we read a b, we go different places depending on which clause in the 
of z 3 was meant in a particular case. If z 3 meant x 2 , we now go to a 3 , but if z 3 meant 
igSLt we are back in a,, then we now go to x 4 . Therefore, we have a new state. However, even 
fehen we are in % we could be there in two ways. We could be continuing to run a string on 
|B B proceed as normal, or else we could have just accepted a part of the string and we 
Ifire starting to process the next section from scratch at a,. Therefore, z 4 has a triple meaning: 


+ z 4 = 


IjjS piis a is an accept state, z 4 can also accept a string that ends its path there. 
r" : - Where do we go if we are in z 3 and we read an a ? If we were in x 2 , we stay there, 
Whereas if we were back in a,, we would go to x 2 . Remember again that every + state is also 
Automatically a possible restart state jumping back to x y Therefore, we return to z 3 . 

S3 If we are in z 4 and we read a b, whether we are in x v x 3 , or a 4 , we definitely go to a 4 , 
which is z 2 . 

I If we are in z 4 and we read an a, we go (if we were in a^ to x 2 , or (if we were in a 3 ) to 
EM or (if we were in a 4 ) to x 4 . Therefore, we are in a new state: 

WBtiz' +z, - 


..which must be a final state because x 2 is. 

prom z 5 an a gets us to (a, or x 2 or x 4 ), which is z 5 itself, whereas a b gets us to (a, or a 3 
m x 4 ), which is z 4 again. 

This finishes the description of the whole machine. It is pictured below: 
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lE^t have corrected the problem by using a combination of Rules 1 and 2 as follows. From 
■Rule 1 we could take the FA that accepts only A and from Rule 2 we could have added the 
RE Ia- jp the FA* produced by the algorithm and thus patch up the problem by adding the 
inissing word. This new machine would have not just one additional state, but would have as 
Winy as twice num ^ er states in FA*. That makes this suggestion a wasteful but math¬ 
ematically adequate resolution. Either way, the algorithm is now complete and correct. 


ALGORITHM (for real) 

NllVfcn an FA whose states are x p x 2 , . . . , an FA that accepts the Kleene closure of the lan¬ 
guage of the original machine can be built as follows: 

Step 1 Create a state for every subset of x’s. Cancel any subset that contains a final 
x-state, but does not contain the start state. 

Step 2 For all the remaining nonempty states, draw an a-edge and a 6-edge to the col¬ 
lection of x-states reachable in the original FA from the component x’s by 
a- and 6-edges, respectively. 

v , , Step 3 Call the null subset a ± state and connect it to whatever states the original start 
state is connected to by a- and 6-edges, even possibly the start state itself. 

Step 4 Finally, put + signs in every state containing an x-component that is a final 
state of the original FA. ■ 


This algorithm will always produce an FA, and the FA it produces satisfies our requirements. 


EXAMPLE 


'Consider the regular expression 


t- This defines the language of all words where all the a 's (of which there is at least one) 
iCbme before all the 6’s (of which there is at least one). 

One FA that accepts this language is 


us consider the language defined by r* 
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we are in the middle of. If we are in z 4 and we read an a , we go to x 3 (if we were in 
(if we were in x,). Therefore, we could say that we are going to a new state: 


This is a collection of as, then b’s, then a's, then b's, and so on. Most words fit tM 
tern. In fact, the only strings not in this language are those that start with a b and tho3 
end with an a . All other strings are words defined by r*. Thus, r* is almost equivalent ]® 

a(a + b)*b 

For example, aababbb is in r* because ( aab ) is in r and ( abbb ) is in r. (Every stflj 
r* can be uniquely factored into its substrings of type r, but this is a side issue.) The|| 
abba is definitely not in r* because it ends in a. 

Now let us build an FA for r*. Let us first see what goes wrong if we try to follow 
incomplete form of the algorithm. We begin with the start state: 


fcwever, the option of being in x 3 is totally worthless. If we ever go there, we cannot accept 
string. Remember x 3 is Davy Jones’s locker. No string that gets there ever leaves or is 
ver accepted. So, if we are interested in the paths by which strings can be accepted, we 
ged only consider that when in z 4 , if we read an a , it is because we were in the x, part of z 4 , 
ot the x 4 part. This a , then, takes us back to z 2 . (This is a touch of extra insight not actually 
rovided by the algorithm. The algorithm requires us blindly to form a new state, z 5 . We 
hail build both machines, the smart one and the algorithm one.) 

gfjf we are in z 2 and we read a b, we go to x 4 (if we were in x 4 ) or x 3 (if we were in x { ). 
tgain, we need not consider the option of going to x 3 (the suicide option), because a path 
ping there could accept no words. So, instead of inventing a new state. 


Reading an a takes us to 


Reading a b in state z, takes us to 


AMhich the preceding algorithm tells us to construct, we can simply assume that from z 4 a b 
■#way$ takes us to x 4 . This is, of course, really the combination (x 4 or jq) because we could 
itbw continue the processing of the next letter as if it were in the state x, having just accepted 
'^factor of type r. This is the case with the word abbab. 

if These options, Xj or x 4 , are already the definition of state z 4 , so we have finished our ma- 


Like its counterpart x 3 , z 3 is a point of no return (abandon all hope, ye that enter). : 
From z 2 if we read an a, we return to z 2 , just as with x 2 . From z 2 if we read a h, w 
ceed to a new state called z A . % 


HPH If we had mechanically followed the algorithm in the proof, we would have constructed 


However, z 4 is not just x 4 . Why? Because when we are processing the string abu\ 
we get to z 4 , we may have just accepted the first factor (ab) as being of the form r j 
about to process the second factor starting again in the state x v On the other hand, if 
processing the string abbab and we have only read the first two letters, even though: 
in z 4 , we have not completed reading the whole first factor of type r. Therefore, 


Because it is possible to end here and accept a string, this must be a final state, but we 
have the option of continuing to read another factor (substring) of type r, or to finish r© 
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Let us practice our algorithm on this machine. 

STifJje first state we want is z,, which must be like jr, except that it is also a final state. If 
e are in z, and we read an a, we come back to jc,, but this time in its capacity as a nonfinal 
ate. We have to give a different name to this state; let us call it z 2 . 

z, = jc, and a final state 
z 2 = jc, and a nonfinal state 


If we are in z, and we read a 6, we must go to a state like jc 2 . Now because jc 2 is a final 
State, we must also include the possibility that once we enter x 2 , we immediately proceed as 
if we were back in x v Therefore, the state z 3 that we go to is simply x { or x 2 and a final state 
because of x 2 . 

At this point, the machine looks like this: 


|k « we are in z 2 and we read an a , we stay in z Y If we are in z 2 and we read a b, we go to 
t* If we are in z 3 and we read an a, it will take us back to z 3 , because if we were in jc,, we 
W)uld stay in jc,, and if we were in jc 2 , we would stay in x 2 . If we are in z 3 and we read a b, 
pen we also return to z 3 , because if we were in jc,, then we would go to jc 2 , and if we were in 
m We would go to jc,. The whole machine is shown on the next page: 
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The only words not accepted by this machine are words of solid as. All other word 
clearly the concatenation of substrings with one b each and are therefore in the closure e 
language of FA y S 

This is another example of how the null string is a royal pain in the neck. One re; 
expression defining the language of all words with an odd number of b's is 


defines the language of all words that are not of the form aa*. Another regular expressio 
this language is I 


Therefore, 


A + (a + b)*b(a + b)* = [a*b(a*ba*b)*a*]* 


It is hard to imagine an algebraic proof of this equation. The problem of determining vil| 
two regular expressions define the same language will be discussed in Chapter 11. 


We have now developed algorithms that, when taken together, finish the proof of; 
of Kleene’s theorem. (We have been in the middle of this project for so long it is pos^U 
lose our perspective.) 

Because of Rules 1, 2, 3, and 4, we know that all regular expressions have corre| 
ing finite automata that give the same language. This is because while we are buildih|j 
regular expression from the elementary building blocks by the recursive definition, ||j 
simultaneously be building the corresponding FA from the four preceding algorithms, 
a powerful example of the strength of recursive definitions. % 

As an example, suppose we want to find an FA to accept the language for the regulaf 
pression (ab)*a(ab + a*)*. Because this is a regular expression, it can be built up t|j 
peated applications of the rules: any letter, sum, product, star. 

The lengthy process of expression and machine-building can proceed as follo^^ 
a letter in the alphabet, so there is an FA that accepts it called FA V Now b is a lettefjg 
alphabet, so there is a machine that accepts it, FA r Then ab is the language of th^ 
uct of the two machines FA } and M,, so there is a machine to accept it, FA y ThlgHi 
is the language of the closure of the machine FA y so there is a machine to accept i 
it FA,. 

Now a* is the language of the closure of the machine FA V so there is an FA to ;a®| 


indeterministic Finite Automata 

"called FA y Now ab + a* is the language of the sum of FA 3 and FA y so there is a machine to 
jgg^pt it, FA 6 . Now (ab + a*)* is the language of the closure of FA 6 ; therefore, there is a ma- 
inline to accept it, FA r Now a(ab + a*)* is the product of FA [ and FA V so there is a machine 
?accept it, FA y Now (ab)*a(ab + a*)* is the product of machines FA 4 and FA y call it FA y 


AH regular expressions can be handled the same way. We have shown that every lan¬ 
guage accepted by an FA can be accepted by a TG, every language accepted by a TG can be 
defined by a regular expression, and every language defined by a regular expression can be 
accepted by an FA. This concludes the proof of all of Kleene’s theorem. 


V . proof has been constructive, which means that we have not only shown that there 
correspondence between regular expressions, FAs and TGs, but we have also shown ex- 
e^tiy how to find examples of the things that correspond. Given any one, we can build the 
* Other two using the techniques outlined in the preceding proof. 

Because TGs seem more understandable, we often work with them instead of struggling 
jwhh the rigors of FAs (especially having to specify what happens in every state to every 
1% letter). 

F®**. The biggest surprise of this theorem may be that TGs are not any more powerful than 
j n ^e sense that there are no extra languages that TGs can accept that FAs could not 
l I handle already. This is too bad because we shall soon show that there are some languages 
that FAs cannot accept, and we shall need a more powerful type of machine than a TG to 

' « deal with them. 

jjplrfe- Even though with a TG we had the right to exercise some degree of judgment—we 
Kg ft .made some decisions about sectioning the reading of the input string—we could do no bet- 
fif ' ter than a purely automatic robot like an FA. The human input factor was worth essentially 

> nothing. 


HHpNDETERMINISTIC FINITE AUTOMATA 


Now that we have shown how a possibly nondeterministic machine like a TG can be turned 
. {by a deterministic algorithmic procedure) into a deterministic machine, an FA, we may in¬ 
troduce a conceptual machine that occurs in practice more frequently than the TG, but that 
shares with it the property of being nondeterministic. 


DEFINITION 

A nondeterministic finite automaton is a TG with a unique start state with the property 
that each of its edge labels is a single alphabet letter. It is given the acronym NFA. Some- 
- times, to distinguish them from NFAs, the regular deterministic finite automata are referred 

to as DFAs. ffi 


We defined NFAs as a type of TGs, but we might just as easily have started with the 
^.Concept of FA and expanded their scope by allowing arbitrarily many a- and h-edges coming 
twit of each state. The result would be the same, but then we would have to restate the notion 
of acceptance of an input string for a nondeterministic machine as the existence of any one 
possible path to +. We would also have to rehash the possibility of crashing and its incon¬ 
clusive consequences. 


r = a*b(a*ba*b)*a* 

Therefore, the regular expression 

r * = [a*b(a*ba*b)*a*]* 


A + (a + b)*b(a + b)* 





i 
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^vantage of this NFA is that the fact of whether looping in state 7 occurs for a given in 
j recorded in whether the path the input follows goes through state 7' or not. In a com 
program, state T may set a flag alerting one to the incidence of looping. I 


are looking for a machine to define the language of all strings with a triple a followed 
riple b , we could design the NFA: 


There is one thing that we must notice about this machine; it will also accept words in 
Which some bbb can occur before the first aaa (by looping at the — state) and then has an- 
&0other bbb later. If the language we were interested in was more precisely the set of all 
in which the first triple b is preceded by a triple a , we need the more complex ma- 
f V chine below: 


Because an NFA is a type of TG and Kleene’s theorem (p. 92) shows us by constructive 
|fe ;Mgorithm how to convert TGs into FAs, it follows that all NFAs can be converted into FAs 
accept the same language. Clearly, all FAs can be considered NFAs that do not make use 
BjfiMN Nl option of extra freedom of edge production. So as language acceptors, NFA = FA. 


h or every NFA, there is some FA that accepts exactly the same language 










■Ifencfeterministic Finite Automata 


AMPLE 


X simple NFA that accepts the language {bb bbb } is 
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problems 


In Chapter 5, Problem 10, we began the discussion of all possible FAs with two states. 
** Write a regular expression for each machine of type 2 and type 3 by using the conversion 
algorithm described in the proof of Theorem 6, Part 2. Even though there is no algorithm 
for recognizing the languages, try to identify as many as possible in the attempt to dis¬ 
cover how many different languages can be accepted by a two-state FA. 


2For Problems 3 through 12, use the following machines: 

5S a FA 2 b 


* 1 - 


3i Using the algorithm of Kleene’s theorem, Part 3, Rule 2, Proof 1, construct FAs for the 
following union languages: 

(i) FA X + FA 2 

(ii) FA X + FA 3 

(iii) FA 2 + M 3 


4. Using the algorithm of Kleene’s theorem, Part 3, Rule 2, Proof 2, construct NFAs for the 
following languages: 

i (i) FA X + FA 2 
m (ii) FA X + M 3 
(iii) FA 2 + FA 3 

5. Using the algorithm of Theorem 6, Part 3, Rule 3, construct FAs for the following prod¬ 
uct languages: 

(i) FA , FA 2 
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(ii) FA X FA 3 -fm 

(iii) FA { FA x |jj 

(iv) FA 2 FA x I 

(v) FA 2 FA 2 -I 

6. Using the algorithm of Part 3, Rule 4, construct FAs for the following languages: 

(i) (FA,)* 4 

(ii) (FA 2 )* 

7. We are now interested in proving Part 3, Rule 3, of Kleene s theorem by NFAs. TS 
sic theory is that when we reach any + state in FA,, we could continue to FA, by frjj 
ing its a-edge and 6-edge, or we could pretend that we have jumped to FA 2 by foil® 


9 . We can use NFAs to prove Theorem 6, Part 3, Rule 4, as well, ine iaea is io a 
nondeterministic jump from any + state to the states reachable from the — state 
and 6-edges. 1 

(i) Provide the details for this proof by constructive algorithm. 

(ii) Draw the resultant NFA for (FA,)*. 

(iii) Draw the resultant NFA for (FA 2 )*. 1 

(iv) Draw the resultant NFA for (F4 3 )*. 

10 . Convert the machines in Problem 9(ii) and (iii) above to FAs by the algorithm 

proof of Theorem 7. i 

11 . Find FAs for the following languages: 

0) FA 4 FA 4 
(ii) (FA 4 )* 

12 . (i) Is the machine for FA, FA, (Problem 5) the same as the machine for (FA,)* 

lem 6)? Are the languages the same? j 

(ii) Is the machine for FA 4 FA 4 the same as the machine for (FA 4 )* (Problem Jj 
the languages the same? 1 



jpjpj F° r the examples derived earlier, which algorithmic method produces product ma- 
* chines with fewer states, the direct (Problem 5) or the NFA (Problem 8)? 

K If some automaton, FA,, has n states and some other automaton, FA 2 , has m states, 
w hat are the maximum number of states possible in each of the machines corre¬ 
sponding to FA, + FA 2 , FA, FA 2 , (FA,)* that are produced. 

(a) By the subset method described in the proof of Kleene’s theorem. 

(b) By building NFAs and then converting them into FAs. 

Convert each of the following NFAs into FAs using the constructive algorithm presented 
| j n Proof 2 of Theorem 7. 

(i) 
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L7. Using the result in Problem 15, find a third proof of Part 3 of Kleene’s theorem: 

(i) Rule 2 

(ii) Rule 3 

(iii) Rule 4 

IS, (i) Find two different machines FA { and FA 2 such that the languages accepted by 
Eg FA { + FA 2 and FA i FA 2 are the same, yet the machines generated by the algorithm in 
the proof of Theorem 6 are different. 

" (ii) Find two different machines FA { and FA 2 such that the algorithm in the proof of 
Theorem 6 creates the same machine for (M,)* and (. FA 2 ).* 

1$. For the language accepted by the following machine, find a different FA with four 
states. Find an NFA that accepts the same language and has only seven edges (where 
edges with two labels are counted twice). 


For Problems 15 through 17, let us now introduce a machine called “a nondeterminis|i 
nite automaton with null string labels,” abbreviated NFA-A. This machine follow? 
same rules as an NFA except that we are allowed to have edges labeled A. 

15. Show that it is possible to use a technique analogous to that used in Proof 2 of The? 

7 to constructively convert an NFA-A into an FA by explicitly giving the steps d 
conversion process. .1 

16. Convert the following NFA-A’s into FAs using the algorithm invented in Problem 1* 
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20. A one-person game can be converted into an NFA as follows. Let every possible 
situation be a state. If any move (there may be several types of moves, but we are 
terested in distinguishing among them) can change some state x into some state 
draw an edge from x to y and label it m. Label the initial position — and the winn 
sitions -K “This game can be won in five moves” is the same as saying, “m 5 is at 
by this NFA.” Once we have the NFA, we use the algorithm of Chapter 7 to co 
into a regular expression. The language it represents tells us how many moves 
each winning sequence. | 

Let us do this with the following example. The game of Flips is played wit 
coins. Initially, they are all heads. A move consists of flipping two coins simultai 
from whatever they were to the opposite side. For example, flipping the en< 
changes THH into HHT. We win when all three coins are tails. There are eight p 
states: HHH, HHT, . . . TTT. The only - is HHH; the only + is TTT. Draw thi 
labeling any edge that can flip between states with the letter m. 

Convert this NFA into a regular expression. Is m 3 or m 5 in the language of t 
chine? The shortest word in this language is the shortest solution of this puzzle. 1 
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| j n 0 ur discussion of finite automata in Chapter 5, our motivation was in part to begin to de- 
IPJjgn a mathematical model for a computer. We said that the input string represents the pro- 
‘ * gram and input data. Reading the letters from the string is analogous to executing instruc- 
1 lions in that it changes the state of the machine; that is, it changes the contents of memory, 
) changes the control section of the computer, and so on. Part of this “and so on,” that was not 
f made explicit before, is the question of output. We mentioned that we could consider the 
output as part of the total state of the machine. This could mean two different things: one, 
#HEhat to enter a specific computer state means change to memory a certain way and print a 
specific character; or two, that a state includes both the present condition of memory plus the 
!‘;?|0tal output thus far. In other words, the state could reflect (in addition to the status of the 

- running program) (i) what we are now printing or (ii) what we have printed in total. One nat- 
P'yral question to ask is, “If we have these two different models, do these machines have equal 

power or are there some tasks that one can do that the other cannot?” 

%*■-' ■ The only explicit task a machine has done so far is to recognize a language. Computers, 

- - as we know, often have the more useful function of performing calculations and conveying 
ii insults. In this chapter, we expand the notion of machine task. 

E f . If we assume that all the printing of output is to be done at the end of the program run, 
^ ; at which time we have an instruction that dumps a buffer that has been assembled, then we 
x-’, have a maximum on the number of characters that the program can print, namely, the size of 
the buffer. However, theoretically we should be able to have outputs of any finite length. For 
\ - example, we might simply want to print out a copy of the input string, which could itself be 
©i&ifeitrarily long. 

- These are questions that have to be faced if we are to claim that our mathematical models 
I °f FAs and TGs represent actual physical machines. In this chapter, we shall investigate two 
I ndifferent models for FAs with output capabilities. These were created by G. H. Mealy (1955) 
and, independently, by E. F. Moore (1956). The original purpose of the inventors was to design 
H ^mathematical model for sequential circuits, which are only one component of the architecture 
of a whole computer. It is an important component and, as we shall see, acts as a machine all 
by itself. We shall present these two models, prove that they are equivalent, and give some ex- 
~ - amples of how they arise in the “logic” section of a computer. 
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20. A one-person game can be converted into an NFA as follows. Let every possible board 
situation be a state. If any move (there may be several types of moves, but we are not in- 
terested in distinguishing among them) can change some state x into some state y, theh 
draw an edge from x to y and label it m. Label the initial position - and the winning po¬ 
sitions + . “This game can be won in five moves” is the same as saying, “w 5 is accepted 
by this NFA.” Once we have the NFA, we use the algorithm of Chapter 7 to convert it 
into a regular expression. The language it represents tells us how many moves are in 
each winning sequence. 

Let us do this with the following example. The game of Flips is played with three 
coins. Initially, they are all heads. A move consists of flipping two coins simultaneously 
from whatever they were to the opposite side. For example, flipping the end coins 
changes THH into HHT. We win when all three coins are tails. There are eight possible 
states: HHH, HHT, . . . TTT. The only — is HHH; the only + is TTT. Draw this NFA, 
labeling any edge that can flip between states with the letter m. 

Convert this NFA into a regular expression. Is m 3 or m 5 in the language of this ma¬ 
chine? The shortest word in this language is the shortest solution of this puzzle. What is 
it? 
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In our discussion of finite automata in Chapter 5, our motivation was in part to begin to de¬ 
sign a mathematical model for a computer. We said that the input string represents the pro¬ 
gram and input data. Reading the letters from the string is analogous to executing instruc¬ 
tions in that it changes the state of the machine; that is, it changes the contents of memory, 
changes the control section of the computer, and so on. Part of this “and so on,” that was not 
made explicit before, is the question of output. We mentioned that we could consider the 
output as part of the total state of the machine. This could mean two different things: one, 
that to enter a specific computer state means change to memory a certain way and print a 
specific character; or two, that a state includes both the present condition of memory plus the 
total output thus far. In other words, the state could reflect (in addition to the status of the 
running program) (i) what we are now printing or (ii) what we have printed in total. One nat¬ 
ural question to ask is, “If we have these two different models, do these machines have equal 
power or are there some tasks that one can do that the other cannot?” 

The only explicit task a machine has done so far is to recognize a language. Computers, 
as we know, often have the more useful function of performing calculations and conveying 
results. In this chapter, we expand the notion of machine task. 

If we assume that all the printing of output is to be done at the end of the program run, 
at which time we have an instruction that dumps a buffer that has been assembled, then we 
have a maximum on the number of characters that the program can print, namely, the size of 
the buffer. However, theoretically we should be able to have outputs of any finite length. For 
example, we might simply want to print out a copy of the input string, which could itself be 
arbitrarily long. 

These are questions that have to be faced if we are to claim that our mathematical models 
of FAs and TGs represent actual physical machines. In this chapter, we shall investigate two 
different models for FAs with output capabilities. These were created by G. H. Mealy (1955) 
and, independently, by E. F. Moore (1956). The original purpose of the inventors was to design 
a mathematical model for sequential circuits, which are only one component of the architecture 
of a whole computer. It is an important component and, as we shall see, acts as a machine all 
by itself. We shall present these two models, prove that they are equivalent, and give some ex¬ 
amples of how they arise in die “logic” section of a computer. 
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DEFINITION 

A Moore machine is a collection of five things: v 1 

1. A finite set of states q 0 , q v q 2 , . . . , where q Q is designated as the start state. 

2. An alphabet of letters for forming the input string 

X = {a b c . . .} 

3. An alphabet of possible output characters 

r = {x y z . . .} 

4 . A transition table that shows for each state and each input letter what state is reached 

next. 

5. An output table that shows what character from T is printed by each state as it is en¬ 
tered. ■ 

Notice that we did not assume that the input alphabet X is the same as the output alpha¬ 
bet T. When dealing with contemporary machines, both input and output are usually en¬ 
coded strings of 0’s and 1 ’s. However, we may interpret the input bit strings as instructions 
in a programming language followed by the data to be processed. We may also wish to 
group the strings of output bits into codes for typewriter characters. We discuss whether it is 
necessary to have more than two letters in an alphabet in Chapter 23. 

To keep the output alphabet separate from the input alphabet, we give it a different 
name, T instead of X, and for its letters we use symbols from the other end of the Latin al¬ 
phabet: {x y z . . .} or numbers (0 1 . . .} instead of [a b c . . Moreover, we 

refer to the input symbols (as we always have) as letters, whereas we call the output sym¬ 
bols characters. 

As we shall see from our circuitry examples, the knowledge of which state is the start 
state is not always important in applications. If the machine is run several times, it may con¬ 
tinue from where it left off rather than restart. Because of this, we can define the Moore ma¬ 
chine in two ways: Either the first symbol printed is the character always specified in the 
start state, or else it is the character specified in the next state, which is the first state chosen 
by the input. We shall adopt the policy that a Moore machine always begins by printing the 
character dictated by the mandatory start state. This difference is not significant. If the input 
string has seven letters, then the output string will have eight characters because it includes 
eight states in its path. 

Because the word “outputted” is so ugly, we shall say “printed” instead, even though we 
realize that the output device does not technically have to be a printer. 

A Moore machine does not define a language of accepted words, because every pos¬ 
sible input string creates an output string and there is no such thing as a final state. The 
processing is terminated when the last input letter is read and the last output character is 
printed. Nevertheless, there are several subtle ways to turn Moore machines into lan- 
guage-definers. 

Moore machines have pictorial representations very similar to their cousins, the FAs. 
We start with little circles depicting the states and directed edges between them labeled with 
input letters. The difference is that instead of having only the name of the state inside the lit¬ 
tle circle, we also specify the output character printed by that state. The two symbols inside 
the circle are separated by a slash On the left side is the name of the state and on the 
right is the output from that state. 
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EXAMPLE 

Let us consider an example defined first by a table: 

Input alphabet: X = {a b } 

Output alphabet: T={0 1} 

Names of states: q 0> q v q 2 , q 3 (q 0 = start state) 


Transition Table 


Old State 

Output 
by the 

Old State 

After Input 

New State 

a After Input b 

~<7o 

1 

V 

<75 

<7i 

0 

<7 3 

4x 

4 2 

0 

% 

<73 

<?3 

1 

<?3 

<72 


The pictorial representation of this Moore machine is 


b 



In Moore machines, so much information is written inside the state circles that there is 
no room for the minus sign indicating the start state. We usually indicate the start state by an 
outside arrow as shown above. As mentioned before, there is no need for any plus signs 
either. 

Let us trace the operation of this machine on the input string abab. We always start this 
machine off in state q Q , which automatically prints out the character 1. We then read the first 
letter of the input string, which is an a and which sends us to state q v This state tells us to 
print a 0. The next input letter is a b, and the loop shows that we return to state q v Being in 
q x again, we print another 0. Then we read an a , go to q v and print a 1. Next, we read a b, go 
to q v and print a 0. This is the end of the run. The output sequence has been 10010. ■ 

EXAMPLE 

Suppose we were interested in knowing exactly how many times the substring aab occurs in 
a long input string. The following Moore machine will count this for us. 


a 
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Every state of this machine prints out the character 0 except for state q 3 , which prints a 
1. To get to state q v we must have come from state q 2 and have just read a 6. To get to state 1 
q 2 , we must have just read at least two a’s in a row, having started in any state. After finding 
the substring aab and tallying a 1 for it, we begin to look for the next aab. If we read a b , we 
start the search in q Q ; if we read an a, we start in q v The number of substrings aab in the in- "5: 
put string will be exactly the number of 1 *s in the output string. 


Input 


a 

a 

a 

b 

a 

b 

6 

a 

a 

6 

b 

State 

% 

<h 

<h 

<h 

<h 


% 

% 

<h 


<h 

% 

Output 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

1 

0 


The example above is part of a whole class of useful Moore machines. Given a language 
L and an FA that accepts it, if we add the printing instruction 0 to any nonfinal state and 1 to 
each final state, the 1 ’s in any output sequence mark the end position of all substrings of the I 
input string starting from the first letter that are words in L. In this way, a Moore machine 
can be said to define the language of all input strings whose output ends in a 1. The machine 
above with q 0 = —, q 3 = 4- accepts all words that end in aab. 


MEALY MACHINES 

Our next subject is another variation of the FA called the Mealy machine. A Mealy machine 
is like a Moore machine except that now we do our printing while we are traveling along the 
edges, not in the states themselves. If we are in state q 4 and we are proceeding to q 7 , we do 
not simply print what q 7 tells us. What we print depends on the edge we take. If there are 
two different edges from q 4 to q 7 , one an a-edge and one a 6-edge, it is possible that they 
will have different printing instructions for us. We take no printing instructions from the 
state itself. 


DEFINITION 

A Mealy machine is a collection of four things: 

1. A finite set of states q 0 ,q v q 2 , . . . , where q 0 is designated as the start state. 

2. An alphabet of letters 2 = {a 6 . . .} for forming input strings. 

3. An alphabet of output characters T = {x y z . . .}. 

4. A pictorial representation with states represented by small circles and directed edges in¬ 

dicating transitions between states. Each edge is labeled with a compound symbol of the 
form Ho, where i is an input letter and o is an output character. Every state must have 
exactly one outgoing edge for each possible input letter. The edge we travel is deter¬ 
mined by the input letter i. While traveling on the edge, we must print the output char¬ 
acter o. ■ 


We have for the sake of variation defined a Mealy machine by its pictorial representa¬ 
tion. One reason for this is that the table definition is not as simple as that for a Moore ma¬ 
chine (see the Problem section, later). 
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EXAMPLE 

The following picture represents a Mealy machine: 



a/l 


Notice that when we arrive in state q 3 we may have just printed a 1 or a 0. If we came 
from state q 0 by the 6-road, we printed a 0. If we got there from q x by the 0 -road, we printed 
a 1. If we got there from q 2 , it depends on whether we took the 0 -road and printed a 0 or the 
6-road and printed a 1. If we were in q 3 already and looped back on the input a , we then 
printed a 1. Every time we enter q v we have just printed a 0; this time it is possible to tell 
this information from the destination state alone. 

Let us trace the running of this machine on the input sequence aaabb. We start in state 
q 0 . In distinction to the Moore machine, here we do not have to print the same character each 
time we start up, even before getting a look at the input. The first input letter is an a, which 
takes us to q x and prints a 0. The second letter is an a, which takes us to q 3 and prints a 1. 
The third letter is an a, which loops us back to q 3 and prints a 1. The fourth letter is a 6, 
which takes us back to q {) and prints a 1. The fifth letter is a 6, which takes us to q 3 and prints 
a 0. The output string for this input is 01110. ■ 

Notice that in a Mealy machine the output string has the same number of characters as 
the input string has letters. As with the Moore machine, the Mealy machine does not define a 
language by accepting and rejecting input strings, so it has no final states. However, we will 
see shortly that there is a sense in which it can recognize a language. 

If there are two edges going in the same direction between the same pair of states, we 
can draw only one arrow and represent the choice of label by the usual comma. 


a/x 



b/y 


EXAMPLE 

One simple example of a useful Mealy machine is one that prints out the 1 ’s complement of 
an input bit string. This means that we want to produce a bit string that has a 1 wherever the 
input string has a 0 and a 0 wherever the input has a 1. For example, the input 101 should be¬ 
come the output 010. One machine that does this is shown on the next page. 
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0/1,1/0 



If the input is 001010, the output is 110101. This is a case where the input alphabet and out¬ 
put alphabet are both {01}. I 


EXAMPLE 

We now consider a Mealy machine called the increment machine that assumes that its input 
is a binary number and prints out the binary number that is one larger We assume that the 
input bit string is a binary number fed in backward, that is, units digit first (then 2’s digit, 4’s 
digit, . . .). The output string will be the binary representation of the number one greater 
and will also be generated right to left. 

The machine will have three states: start, owe-carry, no-carry. The owe-carry state repre¬ 
sents the overflow when two bits equal to 1 are added—we print a 0 and we carry a 1. 

From the start state, we read the first bit. If we read in a 0, we print a 1 and we do not 
owe a carry bit. If we read a 1, we print a 0 and we do owe a carry bit. If at any point in the 
process we are in no-carry (which means that we do not owe a carry), we print the next bit 
just as we read it and remain in no-carry. However, if at some point in the process we are in 
owe-carry, the situation is different. If we read a 0, we print a 1 and go to the no-carry state. 
If we are in owe-carry and we read a 1, we print a 0 and we loop back to owe-carry. The 
complete picture for this machine is 

0/0,1/1 


1/0 

Let us watch this machine in action on the binary representation for the number 11, 
1011. The string is fed into the machine as 1101 (backwards). The first 1 causes a 0 to be 
printed and sends us to owe-carry. The next 1 causes a 0 to be printed and loops back to 
owe-carry. The next input letter is a 0 and causes a 1 to be printed on our way to no-carry. 
The next bit, 1, is printed out, as it is fed in, on the no-carry loop. The total output string isj 
0011, which when reversed is 1100, and is, as desired, the binary representation for the num¬ 
ber 12. 

As simple as this machine is, it can be simplified even further (see Problem 7). 

This machine has the typical Mealy machine property that the output string is exactly as 
long as the input string. This means that if we ran this incrementation machine on the input 
1111, we would get 0000. We must interpret the owe-carry state as an overflow situation if a 
string ever ends there. ■ 
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There is a connection between Mealy machines and sequential circuits (which we touch 
on at the end of this chapter) that makes them a very valuable component of computer the¬ 
ory. The two examples we have just presented are also valuable to computing. Once we have 
an incrementer, we can build a machine that can perform the addition of binary numbers, 
and then we can use the 1 ’s complementing machine to build a subtracting machine based on 
the following principle: 

If a and b are strings of bits, then the subtraction a - b can be performed by 

(1) adding the 1 ’s complement of b to a, ignoring any overflow digit, and 

(2) incrementing the results by 1. 

For example, 

14-5 (decimal) =1110-0101 (binary) 

= 1110 + l’s complement of 0101 + 1 (binary) 

= 1110 + 1010 + 1 (binary) 

= [1J1001 binary = 9 (decimal) (dropping the [1]) 

18 - 7= 10010 - 00111 = 10010+ 11000+ 1 
= [1]01011 = 01011 = 11 (decimal) 

The same trick works in decimal notation if we use 9’s complements, that is, replace 
each digit d in the second number by the digit (9 - d). For example, 

46 - 17 46 + 82 + 1 = [1]29 29. 


EXAMPLE 

Even though a Mealy machine does not accept or reject an input string, it can recognize a 
language by making its output string answer some questions about the input. We have dis¬ 
cussed before the language of all words that have a double letter in them. The Mealy ma¬ 
chine below will take a string of a ?s and b 's and print out a string of 0’s and l’s such that if 
the nth output character is a 1, it means that the nth input letter is the second in a pair of dou¬ 
ble letters. For example, ababhaab becomes 00001010 with l’s in the position of the second 
of each pair of repeated letters. 



just b/l 

read b 


This is similar to the Moore machine that recognized the number of occurrences of the 
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substring aab. This machine recognizes the occurrences of aa or bb. Notice that the triple¬ 
letter word aaa produces the output 011 since the second and third letters are both the back 
end of a pair of double a’s. >0 


MOORE = MEALY 

So far, our definition of the equivalence of two machines has been that they accept the same 
language. In this sense, we cannot compare a Mealy machine and a Moore machine. How¬ 
ever, we may say that two output automata are equivalent if they always give the same out¬ 
put string when presented with the same input string. In this way, two Mealy machines may 
be equivalent and two Moore machines may be equivalent, but a Moore machine can never 
be directly equivalent to a Mealy machine because the length of the output string from a 
Moore machine is one longer than that from a Mealy machine given the same input. The 
problem is that a Moore machine always begins with one automatic start symbol. 

To get around this difficulty, we define a Mealy machine to be equivalent to a Moore 
machine whenever they always result in the same output if the automatic start symbol for the 
Moore machine is deleted from the front of the output. 

DEFINITION 

Given the Mealy machine Me and the Moore machine Mo, which prints the automatic start- 
state character x, we will say that these two machines are equivalent if for every input string 
the output string from Mo is exactly x concatenated with the output from Me. ■ 

Rather than debate the merits of the two types of machines, we prove that for every Moore 
machine there is an equivalent Mealy machine and for every Mealy machine there is an equiva¬ 
lent Moore machine. We can then say that the two types of machines are functionally equivalent. 


THEOREM 8 

If Mo is a Moore machine, then there is a Mealy machine Me that is equivalent to it. 

PROOF 

The proof will be by constructive algorithm. 

Consider any particular state in Mo —call it q r It gives instructions to print a certain 
character—call it t. Let us consider all the edges that enter this state. Each of them is labeled 
with an input letter. Let us change this. Let us relabel all the edges coming into q 4 . If they 
were previously labeled a or b or c . . . Jet them now be labeled alt or bit or c/t . . . and 
let us erase the t from inside the state q 4 . This means that we shall be printing a / on the in¬ 
coming edges before they enter q 4 . 
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We leave the outgoing edges from q 4 alone. They will be relabeled to print the character as¬ 
sociated with the state to which they lead. 

If we repeat this procedure for every state q 0 ,q v . . . , we turn Mo into a Mealy ma¬ 
chine Me. As we move from state to state, the things that get printed are exactly what Mo 
would have printed itself. 

The symbol that used to be printed automatically when the machine started in state q 0 is 
no longer the first output character, but this does not stop the rest of the output string from 
being the same. 

■ 

Therefore, every Mo is equivalent to some Me. 

EXAMPLE 

Below, a Moore machine is converted into a Mealy machine by the algorithm of the proof 
above: 



THEOREM 9 

For every Mealy machine Me, there is a Moore machine Mo that is equivalent to it. 

PROOF 

Again, the proof will be by constructive algorithm. 

We cannot just do the reverse of the previous procedure. If we were to try to push the 
printing instruction from the edge as it is in Me to the inside of the state as it should be for a 
Moore machine, we might end up with a conflict. Two edges might come into the same state 
but have different printing instructions, as in this example: 
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We must select the start state for the new machine, so let us arbitrarily select qf y Notice 
that we now have two edges that cross. This sometimes happens, but aside from making a 
messier picture, there is no real problem in understanding which edge goes where. Notice 
that the edge from q x to q 0 , which used to be labeled a/0, is now only labeled a because the 
instruction to print the 0 is found in the state q^/0. The same is true for the edge from q 3 to 
qf v which also loses its printing instruction. 

State q x has only two edges coming into it: one from q 2 labeled a/1 and a loop labeled 
b/ 1. So whenever we enter q v we are always printing a 1. We have no trouble here transfer¬ 
ring the print instructions from the edges into the state. The machine now looks like this: 



What we have now is a partially converted machine or hybrid. We could run an input 
string on this machine, and it would give us the same output as the original Me. The rules are 
that if an edge says print, then print; if a state says print, then print. If not, do not. 

Let us continue the conversion. State q 3 is easy to handle. Two edges come into it, both 
labeled hi 0, so we change the state to q 3 /0 and simplify the edge labels to b alone: 
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b 



The only job left is to convert state q T It has some 0-printing edges entering it and some 
1-printing edges (actually two of each, counting the loop). Therefore, we must split it into 
two copies, q\ and q\. Let the first print a 0 and the second print a 1. The two copies will be 
connected by a b-e dge going from q\ to q\ (to print a 0). There will also be a Moop at q\. 
The final machine is 


b 



TRANSDUCERS AS MODELS OF SEQUENTIAL CIRCUITS 

The student of computer science may already have met these machines in courses on com¬ 
puter logic or architecture. They are commonly used to describe the action of sequential cir¬ 
cuits that involve flip-flops and other feedback electronic devices for which the output of the 
circuit is not only a function of the specific instantaneous inputs, but also a function of the 
previous state of the system. The total amount of history of the input string that can be “re¬ 
membered” in a finite automaton is bounded by a function of the number of states the 
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automaton has. Automata with input and output are sometimes called transducers because 
of their connection to electronics. 

v 

EXAMPLE 

Let us consider an example of a simple sequential circuit. The box labeled NAND means 
“not and.” Its output wire carries the complement of the Boolean AND of its input wires. 
The output of the box labeled DELAY is the same as its previous input. It delays transmis¬ 
sion of the signal along the wire by one step (clock pulse). The DELAY is sometimes called 
a D flip-flop. The AND and OR are as usual. Current in a wire is denoted by the value 1, no 

current by 0. 



We identify four states based on whether or not there is current at points A and 5 in the 
circuit: 

q 0 is A — 0, 5 = 0 

q x is A = 0, 5 = 1 

q 2 is A = 1, 5 = 0 

q 3 is A = 1, 5=1 

The operation of this circuit is such that after an input of 0 or 1, the state changes ac¬ 
cording to the following rules: 

New 5 = old A 

New A = (input) NAND (old A OR old 5) 

Output = (input) OR (old 5) 

At a sequence of discrete pulses of a time clock a string of input is received, the state 

changes, and output is generated. 

Suppose we are in state q 0 and we receive the input 0: 

New 5 = old A = 0 

New A ~ (input) NAND (old A OR old 5) 

= (0) NAND (0OR0) 

= 0 NAND 0 
= 1 

Output = 0 OR 0 = 0 

The new state is q 2 (because new A = 1, new 5 = 0). 

If we are in state q 0 and we receive the input 1, 

New 5 = old A = 0 

New A = 1 NAND (0 OR 0) = 1 

Output = 1 OR 0 = 1 
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The new state is q 2 (because the new A = 1 and the new 5 = 0). 
If we are in q } and we receive the input 0: 

New 5 = old A = 0 

New A = 0 NAND (0 OR 1) = 1 

Output = 0 OR 1 = 1 

The new state is q 2 . 

If we are in q x and we receive the input 1, 

New 5 = old A = 0 

New A = 1 NAND (0 OR 1) = 0 

Output = 1 OR 1 = 1 


The new state is q Q . 

If we are in state q 2 and we receive the input 0, 

New 5 = old A = 1 

NewA = 0NAND (1 ORO) = 1 

Output = 0 OR 0= 0 

The new state is q v 

If we are in q 2 and we receive the input 1, 

New 5 = old A = 1 

NewA = 1 NAND (1 OR 0) = 0 

Output = 1 OR 0=1 

The new state is q x 

If we are in q 3 and we receive the input 0, 

New 5 = old A = 1 

New A = 0 NAND (1 OR 1) = 1 

Output = 0 OR 1 = 1 

The new state is q 3 

If we are in q 3 and we receive the input 1, 

New 5 = old A = I 

New A = 1 NAND (1 OR 1) = 0 

Output = 1 OR 1 = 1 

The new state is q v 


After Input 0 

Old State New State 


Output 


After Input 1 

New State Output 
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If we input two 0’s no matter which state we started from, we will get to state q y From 
there, the input string 011011 will cause the output sequence 111011. ■ 


Comparison Table for Automata 



FA 

TG 

NFA 

NFA-A 

MOORE 

MEALY 

Start states 

One 

One or more 

One 

One 

f >ne 

One 

Final states 

Some or 

none 

Some or 

none 

Some or 

none 

Some or 

none 

None 

None 

Edge labels 

Letters 
from X 

Words from 
X* 

Letters 
from X 

Letters from 
X and A 

Letters 
from X 

i/o 

i from X 
o from T 

Humber of 
edges from 
each state 

One for each 
letter in X 

Arbitrary 

Arbitrary 

Arbitrary 

One for 
each letter 
in X 

One for 
each 

letter in X 

Deterministic 

Yes 

No 

No 

No 

Yes 

Yes 

Output 

No 

No 

No 

No 

Yes 

Yes 

Page defined 

53 

79 

135 

146 

150 

152 


PROBLEMS 

1. Each of the following is a Moore machine with alphabet X — {a b } and output alpha 
bet F = {0 1}. Given the transition and output tables, draw the machines. 

(i) 1 a b Output (iv) I a b Output 


(i) 

a 

b 

Output 

% 

<7. 

<h 

1 

<7i 

<7, 

<h 

0 

<h 

<?i 

% 

1 

(ii) 

a 

b 

Output 

<7o 

<7o 

<?2 

0 


<7i 

% 

1 

<?2 

<72 

<h 

1 

(iii) 

a 

b 

Output 

<7o 

% 

<7. 

1 

<7i 

<?o 

<72 

0 

<h 

<h 

<7 2 

1 

<h 

<7i 

<7i 

0 



a 

b 

Output 

% 

<h 


0 

<7i 

<7i 

% 

0 

<7 2 

<7 2 

<?3 

1 

<7 3 

<?0 

<7i 

0 



a 

b 

Output 

% 

<?i 

<7 2 

0 

<7i 

<72 

<7 3 

0 

<?2 

<7.3 

<74 

1 

<7 3 

<?4 

<?4 

0 

<74 

<7o 

% 

0 


- . « 
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2. (i) Based on the table representation for Moore machines, how many different Mo's 

are there with four states? 

(ii) How many different Moore machines are there with n states? 

3. For each of the following Moore machines, construct the transition and output tables: 




(iii) a. b 





4. On each of the Moore machines in Problems 1 and 3, run the input sequence aabab. 
What are their respective outputs? 

5. Suppose we define a Less machine to be a Moore machine that does not automatically 
print the character of the start state. The first character it prints is the character of the 
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second state it enters. From then on, for every state it enters it prints a character, even 
when it reenters the start state. In this way, the input string gets to have some say in 
what the first character printed is going to be. Show that these Less machines are equiv¬ 
alent to Mealy machines in the direct sense, that is, for every Less machine there is a 
Mealy machine that has the same output for every input string. 

6. Mealy machines can also be defined by transition tables. The rows and the columns are 
both labeled with the names of the states. The entry in the table is the label of the edge 
(or edges) going from the row state to the column state (if there is no such edge, this en¬ 
try is blank). 

Construct the transition table for each of the four Mealy machines shown below: 


(i) w/ o 








7. The example of the increment machine on p. 154 used three states to perform its job. J 
Show that two states are all that are needed. 



Problems 


8. Convert the Moore machines in Problem 3 into Mealy machines. 

9. Convert the Mealy machines in Problem 6 into Moore machines. 

10 . Draw a Mealy machine equivalent to the following sequential circuit: 



11 . Construct a Mealy machine that produces an output string of solid l’s no matter what 
the input string is. 

12. (i) Design a machine to perform a parity check on the input string; that is, the output 

string ends in 1 if the total number of 1-bits in the input string is odd and 0 if the 
total number of 1-bits in the input string is even (the front part of the output string 
is ignored). 

(ii) In your answer to (i), did you choose a Mealy or Moore machine and why was that 
the right choice? 

13 . Given a bit string of length n, the shift-left-cyclic operation places the first bit at the end, 
leaving the rest of the bits unchanged. For example, SLC (100110) = 001101. 

(i) Build a Mealy machine with input and output alphabet {0 1 $} such that for any 
bit string x when we input the n + 1 bits *$, we get as output the n + 1 bit string $ 
SLC(x). 

(ii) Explain why this cannot be done without a $. 

For Problems 14 through 16, let (Me) 2 mean that given a Mealy machine, an input string is 
processed and then the output string is immediately fed into the machine (as input) and re¬ 
processed. Only this second resultant output is considered the final output of (Me) 2 . If the fi¬ 
nal output string is the same as the original input string, we say that (Me) 2 has an identity 
property. Symbolically, we write (Me) 2 = identity. 

14 . Let Me, be the identity Mealy machine that looks like this: 



Let Me 2 be the 1 ’s complement Mealy machine pictured below: 



Prove that both (Me,) 2 and (Me 2 ) 2 have the identity property that the result of processing 
any bit string is the original string again. 
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15. Show that the following machine also has this identity property: 


o/o, 1/1 



16 . Find yet another Mealy machine with this identity property. 

For Problems 17 and 18, similarly, given two Mealy machines, let (MeJ(Me 2 ) mean that an 
input string is processed on Me x and then the output string is immediately fed into Me 2 (as 
input) and reprocessed. Only this second resultant output is considered the final output of 
(Me l )(Me 2 ). If the final output string is the same as the original input string, we say that 
(Me l )(Me 2 ) has the identity property, symbolically written {Me { )(Me 2 ) = identity. 

Given two specific machines such that (Me l )(Me 2 ) reproduces the original bit string, we 
aim to prove (in the following two problems) that (Me 2 )(Me l ) must necessarily also have this 
property. 

17 . Show that the 2 n possible //-bit strings when fed into Me x give 2" different outputs. 

18 . Take the equality (M<? 1 )(M£ , 2 ) = identity. Multiply both sides by Me x to get 
(Me 1 )(Afc 2 )(Me 1 ) = identity (Me { ) = Me v This means that (Me 2 )(Me x ) takes all outputs 
from Me x and leaves them unchanged. Show that this observation completes the proof, 

19 . You are given these two Mealy machines: 

0 / 0 , 1/1 0 / 1 , 1/0 



Notice that they are indeed different and show that each is the inverse machine of the other, 
that means that 

(Me x ){Me 2 ) = identity = (Me 2 )(Me x ) 

20. Prove that there is no Mealy machine that reverses an input string, that is, 

Me{s) = transpose^). 


CHAPTER 9 


Regular 

Languages 


CLOSURE PROPERTIES 

A language that can be defined by a regular expression is called a regular language. In the 
next chapter, we address the important question, “Are all languages regular?” The answer is 
no. But before beginning to worry about how to prove this fact, we shall discuss in this chap¬ 
ter some of the properties of the class of all languages that are regular. 

The information we already have about regular languages is summarized in the follow¬ 
ing theorem. 

THEOREM 10 

If Lj and L 2 are regular languages, then + L v L X L V and L* are also regular languages. 

Remark 

Lj + L 2 means the language of all words in either L { or L r L { L 2 means the language of all 
words formed by concatenating a word from L, with a word from L r L* means strings that 
are the concatenation of arbitrarily many factors from L,. The result stated in this theorem is 
often expressed by saying: The set of regular languages is closed under union, concatena¬ 
tion, and Kleene closure. 

PROOF 1 (by regular expressions) 

If L { and L 2 are regular languages, there are regular expressions r, and r 2 that define these 
languages. Then (r, + r 2 ) is a regular expression that defines the language L x + L 2 . The lan¬ 
guage L 1 L 2 can be defined by the regular expression rjr 2 . The language L* can be defined by 
the regular expression (rj)*. Therefore, all three of these sets of words are definable by regu¬ 
lar expressions and so are themselves regular languages. ■ 

The proof of Theorem 10 above uses the fact that L x and L 2 must be definable by regular 
expressions if they are regular languages. Regular languages can also be defined in terms of 
machines, and as it so happens, machines can also be used to prove this theorem. 
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15. Show that the following machine also has this identity property: 

o/o, 1/1 

0 / 1 , 1/0 


16 . Find yet another Mealy machine with this identity property. 

For Problems 17 and 18, similarly, given two Mealy machines, let (Me { )(Me 2 ) mean] 
input string is processed on Me x and then the output string is immediately fed into A 
input) and reprocessed. Only this second resultant output is considered the final ot| 

(i Me x )(Me 2 ). If the final output string is the same as the original input string, we 
(Me y )(Me 2 ) has the identity property, symbolically written (Me,)(Afr? 2 ) = identity. 

Given two specific machines such that (Me x )(Me 2 ) reproduces the original bit striti 
aim to prove (in the following two problems) that (Me 2 )(Me,) must necessarily also hav^jj 
property. 

17 . Show that the 2 n possible n -bit strings when fed into Me { give 2 " different outputs.! 

18 . Take the equality (Me y )(Me 2 ) = identity. Multiply both sides by Me y 
{Me x )(MeJ{Me x ) = identity (Me y ) = Me v This means that (Me 2 )(Me ,) takes alF 
from Me. and leaves them unchanged. Show that this observation completes the $ 

19 . You are given these two Mealy machines: 

0/0, 1/1 


1 / 0 , 0/1 


Notice that they are indeed different and show that each is the inverse machine of the i| 
that means that 

(Me y )(Me 2 ) = identity = {Me 2 )(Me { ) 

20. Prove that there is no Mealy machine that reverses an input string, that is, 

Me(s) — transpose^). 


Regular 

Languages 


SURE PROPERTIES 

A language that can be defined by a regular expression is called a regular language. In the 
next chapter, we address the important question, “Are all languages regular?” The answer is 
jpq, .But before beginning to worry about how to prove this fact, we shall discuss in this chap¬ 
ter some of the properties of the class of all languages that are regular. 

The information we already have about regular languages is summarized in the follow- 
!-kg theorem. 

THEOREM 10 

Ifand L 2 are regular languages, then L, + L v L y L v and L* are also regular languages. 
Remark 

? ,£fj + L 2 means the language of all words in either L, or L r L y L 2 means the language of all 
words formed by concatenating a word from L, with a word from L r L* means strings that 
■are the concatenation of arbitrarily many factors from L v The result stated in this theorem is 
gppfien expressed by saying: The set of regular languages is closed under union, concatena¬ 
tion, and Kleene closure. 

PROOF 1 (by regular expressions) 

j35i an< 3 L 2 are regular languages, there are regular expressions r, and r 2 that define these 
% ^guages. Then (r, + r 2 ) is a regular expression that defines the language L y + L r The lan- 
can be defined by the regular expression r^. The language L* can be defined by 
;4he regular expression (r,)*. Therefore, all three of these sets of words are definable by regu- 
expressions and so are themselves regular languages. ■ 

The proof of Theorem 10 above uses the fact that L, and L 2 must be definable by regular 
l. 5Xpressions ^ the y are regular languages. Regular languages can also be defined in terms of 
Machines, and as it so happens, machines can also be used to prove this theorem. 
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PROOF 2 (by machines) 


Starting at the - of TG V our only option is to ionow a pan 
TG 2 , we can only follow a path on TG r Starting at the new — stat 
machine or the other; once there, we stay there. This machine pro’ 
The TG described below accepts the language L X L 2 : 


where 1 is the former + of TG X and 2 is the former 1 
The TG described below accepts the language L* 


;ular expression 






a regular language, then L' is also a regular language. In other words, the set of regu- 
juages is closed under complementation. 


a regular language, we know from Kleene’s theorem that there is some FA that ac- 
M language L. Some of the states of this FA are final states and, most likely, some are 
t us reverse the final _status of each stat e ; that is, if it was a fi nal state, make it a nonfi- 
tate, and if it was a nonfinal state, make it a final state. If an input string formerly ended 
nonfinal state, it now ends in a final state and vice versa. This new machine we have 

. accepts all input strings that were not accepted by the original FA (all the words in L') 

rejects all the input strings that the FA used to accept (the words in L). Therefore, this 
chine accepts exactly the language L'. So, by Kleene’s theorem, L' is regular. ■ 


gg-'Notice that even the final status of the 

\MPLE 


state gets reversed: 


I x'MOX ' 


that accepts only the strings aba and abb is shown below: 


that accepts all strings other than aba and abb is shown on the next page. 


HUoents and Intersections 173 

AMPLE 

S fhe language over the alphabet X - [a b } of all words that have a double a in them, 
ftffc the language of all words that do not have a double a. ■ 

, X if is important to specify the alphabet X, or else the complement of L might contain cat , 
y^g- frog, because these are definitely not strings in L. 

Notice that the complement of the language V is the language L. We could write this as 


pint; 'phis is a theorem from set theory that is not restricted only to languages. 








This means that the language L, 1*1 L 2 consists of all words that are not in either L, 
cause L. and L 2 are regular, then so are L\ and i,'. Since L and L are re 
L\ + And because L\ + L{ is regular, then so is (L\ + L 2 ) , which means 1,01,. 


This is a case of “the proof is quicker than the eye.” When we start with two 
L and U which are known to be regular because they are defined by FAs finding 
L, n L, is not as easy as the proof makes it seem. If L, and L 2 are define y reg 
sions, finding L, fl L, can be even harder. However, all the algorithms that we ne 
constructions have already been developed. 


CHAPTER 9 Regular Languages 


Let us work out one example in complete detail. We begin with two langi 

2= In M- 

Lj = all strings with a double a 
L 7 = all strings with an even number of as 


(i[ + L' 2 y= 


THEOREM 12 


If L t and L 2 are regular languages, then L, fl L 2 is also a regular language. In 
the set of regular languages is closed under intersection. 


PROOF 


By DeMorgan’s law for sets of any kind (regular languages or not): 

L x H L 2 = (LJ + L')' 

This is illustrated by the Venn diagrams below: 


EXAMPLE 


ments and Intersections 




-'mages are not the same, because aaa is in L x but not in L 2 and aba is in L 2 but not in L v 
are both regular languages because they are defined by the following regular ex- 
(among others): 

r x = (a 4- b)*aa(a + b)* 
r = b*(ab*ab*)* 


am 

Sir 


regular expression r 2 is somewhat new to us. A word in the language L 2 can have 
b’s in the front, but then whenever there is an a, it is balanced (after some b y s) by an- 
a xhis gives us factors of the form (ab*ab*). The word can have as many factors of 
m as it wants. It can end in an a or a b. 

cause these two languages are regular, Kleene’s theorem says that they can also be 
by FAs. The two smallest of these are 











the first machine, we stay in the start state until we read our first a; then we move to 
die state. This is our opportunity to find a double a. If we read another a from the in- 
g while in the middle state, we move to the final state where we remain. If we miss 
ce and read a b, we go back to —. If we never get past the middle state, the word has 
uble a and is rejected. We have seen this before, 
e second machine switches from the left state to the right state or from the right 
o the left state every time it reads an a. It ignores all b y s. If the string begins on the 
id ends on the left, it must have made an even number of left/right switches. There- 
e strings this machine accepts are exactly those in L r We have also seen this be- 




the first step in building the machine (and regular expression) for L x D L 2 is to find 
hines that accept the complementary languages L[ and L' r Although it is not neces- 
the successful execution of the algorithm, the English description of these languages 

L[ = all strings that do not contain the substring aa 




L' 2 = all strings having an odd number of a’s 

the proof of the theorem where the complement of a regular language is regular, we 
algorithm for building the machines that accept these languages. All that we have to 
'everse what is a final state and what is not a final state. The machines for these lan- 
arethen 
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Even if we are going to want both the regular expression and the FA for the intei 
language, we do not need to find the regular expressions that go with these two cpi 
machines. However, it is good exercise and the algorithm for doing this was presSj 
part of the proof of Kleene’s theorem. Recall that we go through stages of transition,^ 
with edges labeled by regular expressions. FA\ becomes 


a + b 


State 3 is part of no path from - to + , so it can be dropped. To bypass state 2,||j| 
to join the incoming a-edge with both outgoing edges (b-e dge to 1 and A-edge to 
we add the two loops, we get b + ab and the sum of the two edges from 1 to + is a|| 
the machine looks like this: I 


b + ab 


The last step is to bypass state 1. To do this, we concatenate the incoming A-labeH» 
loop label starred (b + ab)* concatenated with the outgoing (a + A)-label to pw«^j 
edge from - to + with the regular expression for L\. 

r[ = (b + ab)*(a + A) 

Let us now do the same thing for the language L 2 . FA 2 becomes 

b b 


Implements and Intersections 


Ufo i U! g s tart the simplification of this picture by eliminating state 2. There is one incom- 
^ a loop, and two outgoing edges, so we need to replace them with only two edges: 
i -2-2-1 becomes a loop at 1 and the path 1-2-2-+ becomes an edge from 1 to +. 
^ SS tT bypassing state 2 and adding the two loop labels, we have 


b+ab*a 


Ijslfe can now eliminate state 1 and we have 

(b+ab*a)*ab* 


which gives us the regular expression 
|§a|tf - r' 2 = (b + ab*a)*ab* 

: tTiis is one of several regular expressions that define the language of all words with an 
number of a’s. Another is 

5g \ b*ab*(ab*ab*)* 

f%hich we get by adding the factor b*a in front of the regular expression for L { . This works 
lj||&ose words with an odd number of a’s can be interpreted as b*a in front of words with an 
sMjfen number of a’s. The fact that these two different regular expressions define the same lan- 
is not obvious. The question, “How can we tell when two regular expressions are 
^quai?’’, will be answered in Chapter 11. 

We now have regular expressions for Lj and L' 2 , so we can write the regular expression 
sforLj + L' r This will be 

- r[ + r' = (b + ab)*(A + a) + (b + ab*a)*ab* 

jiff, :;. We must now go in the other direction and make this regular expression into an FA so that 
can take its complement to get the FA that defines L, IT L r 
B EIL y j To build the FA that corresponds to a complicated regular expression is no picnic, as we 
P^^^ smember from the proof of Kleene’s theorem, but it can be done. However not by anybody 
jgglpcasonable as ourselves. Clever people like us can always find a better way. 

IjSpIti alternative approach is to make the machine for L[ + L 2 directly from the machines 
fjfy L[ and L 2 without resorting to regular expressions. 
jjgl tet us label the states in the two machines for FA[ and FA 2 as shown: 


WSSgg&W*'* - u u 

Inhere the start states are jc, and y, and the final states are jc,, x 2 , and y 2 . The six possible 
^^bination states are 

2j = jt, or y, start, final (words ending here are accepted in FA[) 
jifc z 2 = x \ or ^2 b na i (words ending here are accepted on FA[ and FA 2 ) 








ending here are accepted on FA,) 
ending here are accepted on FA[ < 
ther machine 

ending here are accepted on FAj) 
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m 


(odd number of not doubled «’s)(first aa)(odd number of a s may be doubled) 3 


Notice that the first factor must end in b, because none of its a's are part of a double a -i 


= [(b + abb*ab)*abb*]aa[b*a(b + ab*a)*] 
= (b + abb*ab)*(a)(bb*aab*a)(b + ab*a)* 
= type 2 




Adding type 1 and type 2 together (and factoring out like terms using the dis|S 
law), we obtain the same expression we got from the algorithm. We now have tw<JB 
that this is indeed a regular expression for the language L, fl L r 
This completes the calculation that was started on p. 174. 


The proofs of the last three theorems are a tour de force of technique. The firijgl 
proved by regular expressions and TGs, the second by FAs, and the third by a Venn l^H 
We must confess now that the proof of the theorem that the intersection of twJjB 
languages is again a regular language was an evil pedagogical trick. The theorem is M 
ally as difficult as we made it seem. We chose the hard way to do things becaus^f(H| 
good example of mathematical thinking: Reduce the problem to elements that havgSSffi 
been solved. 

This procedure is reminiscent of a famous story about a theoretical mathematiciS! 
fessor X is surprised one day to find his desk on fire. He grabs the extinguisher 
the flames. The next day, he looks up from his book to see that his wastepaper h aakJll 
fire. Quickly, he takes the basket and empties it onto his desk, which begins to bum, jjjj 
thus reduced the problem to one he has already solved, he goes back to his reading. (fljj 
dents who find this funny are probably the ones who have been setting the fires in hrs : 3S 
The following is a more direct proof that the intersection of two regular lan§J|8| 
regular. 


GOOD PROOF OF THEOREM 12 


Let us recall the method we introduced to produce the union-machine FA 2 that acce||j§ 
string accepted by either FA { or FA T 3 

To prove this, we showed how to build a machine with states z v z 2 , . . . of tl llpB 
Something if the in P ut is running on FA X or >’ something if the input is running on FA r If either^ 
x-state or the y-state was a final state, we made the z-state a final state. 

Let us now build the exact same machine FA V but let us change the designation;^ 
states. Let the z-state be a final state only if both the corresponding x-state and tfit|||| 
sponding y-state are final states. Now FA 3 accepts only strings that reach final states siirfjg 
neously on both machines. - f§ 

The words in the language for FA 3 are words in both the languages for FA X ancf jfllll 
is therefore a machine for the intersection language. 


Not only is the proof shorter but also the construction of the machine has fewer 


EXAMPLE 


In the proof of Kleene’s theorem, we took the sum of the machine that accepts wordf^j 
double a , 



:hat it has 
tates mus 





-x, 




X 


+ x 3 

oil r i 

tUnt nr 
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The dashed lines are perfectly good edges, but they have to cross other edges, % 
tie imagination, we can see how this machine accepts all EVEN-EVEN with a dofti 
north-south changes are caused by b’s, all east-west by a' s. To get into the inner ft 
takes a double a. 


EXAMPLE 


Let us rework the example in the first proof once again, this time by the quick 
This is like the citizens of the fabled city of Chelm who on learning that they did 
to carry all their logs down from the top of the mountain were so overjoyed that j 
ried them all back up again so that they could use the clever work-saving rn 
rolling them down. 

L. = all strings with a double a I 


L 2 = all strings with an even number of as 


The machine that simulates the same input running on both machines at once 


Old States 



lents and Intersections 


MPLE 


pet us work through one last example of intersection. Our two languages will be 


Lj = all words that begin with an a 
L-j — all words that end with an a 


jgfpho intersection language will be 

L ] n L 2 — all words that begin and end with the letter a 
IE* The language is obviously regular because it can be defined by the regular expression 

a(a 4- b)*a + a 

KpBte-that the first term requires that the first and last a ’s be different, which is why we need 
second choice “+ a.” 

this example, we were lucky enough to “understand” the languages, so we could 
jpgjSpcoct a regular expression that we “understand” represents the intersection. In general, 
||§§fe does not happen, so we follow the algorithm presented in the proof, which we can ex- 
gpetite even without the benefit of understanding. (Although the normal quota of insights 
human is one per year, the daily adult requirement of interpreting regular expressions 
fjieven lower.) 

For this, we must begin with FAs that define these languages: 












Because we are instead constructing the machine for 

Lj n L 2 — all words in both L, and L 2 
(Ut a + only in the state that represents acceptance by both machines at once 


p Strings ending here are accepted if being run on FA X (by ending in x 2 ) and if being run 
Wmt0>y ending in y 2 ). ■ 

fepo not be fooled by this slight confusion: 

z 2 —x 2 or y 2 = accepted by F/4, and FA 2 
M The poor plus sign is perilously overworked. 

2 + 2 (sometimes read “2 and 2 are 4”) 

(a + b)* (a or b repeated as often as we choose) 
a + (a string of at least one a) 

Ip Lj + L 2 (all words in L { or L 2 ) 

s>- + z~, z.,+ (z 2 is a final state, the machine accepts input strings if they end here) 

gill" 1 + 1=2 Arithmetic 

1 + 1 = 10 Binary 

1 + 1 =0 Modulo 2 
Qfif 1 + 1 = 1 Boolean 


humans were not smarter than machines, they could never cope with the mess they 
af their own notation. 


^pp&each of the following pairs of regular languages, find a regular expression and an FA that 
|j ;each define L, D L 2 : 








CHAPTER 


Nonregular 

Languages 


THE PUMPING LEMMA 


By using FAs and regular expressions, we have been able to define many languages. Al¬ 
though these languages have had many different structures, they took only a few basic 
forms: languages with required substrings, languages that forbid some substrings, languages 
that begin or end with certain strings, languages with certain even/odd properties, and so on. 
We will now turn our attention to some new forms, such as the language PALINDROME of 
Chapter 3 or the language PRIME of all words a p , where p is a prime number. In this chap¬ 
ter, we shall see that neither of these is a regular language. We can describe them in English, 
but they cannot be defined by an FA. More powerful machines are needed to define them, 
machines that we build in later chapters. 


DEFINITION 

A language that cannot be defined by a regular expression is called a nonregular language. 


By Kleene’s theorem, a nonregular language can also not be accepted by any FA or TG. 
All languages are either regular or nonregular; none are both. 

Let us first consider a simple case. Let us define the language L. 

L = | A ab aabb aaabbb aaaabbbb aaaaabbbbb . . .} 

We could also define this language by the formula 


or for short 


L — [a n b n } 
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11. (ab*)* a(a + b)* 

12. (ab*)* (a + b)*aa(a + b)* 

13. All strings of even length b(a + b>* 

= (aa + ab + ba + bb)* 

14. Even-length strings (a + b)*aa(a + b)* 

15. Even-length strings (b + ab)*(a + A) 

16. Odd-length strings a(a + b)* 

17. Even-length strings EVEN-EVEN 

18. (i) Even-length strings Strings with an even number of a ’s 

(ii) Even-length strings Strings with an odd number of a’s 

19. (i) Even-length strings Strings with an odd number of a’s and an odd number of 'U 

(ii) Even-length strings Strings with an odd number of a’s and an even number oil 

20. We have seen that because the regular languages are closed under union and complex 
must be closed under intersection. Find a collection of languages that is closed under uni) 
tersection but not under complement. 
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j§ PUMPING LEMMA 

^|y using FAs and regular expressions, we have been able to define many languages. Al- 
ajfoough these languages have had many different structures, they took only a few basic 
languages with required substrings, languages that forbid some substrings, languages 
that begin or end with certain strings, languages with certain even/odd properties, and so on. 
jl^will now turn our attention to some new forms, such as the language PALINDROME of 
aplfaapter 3 or the language PRIME of all words a'\ where p is a prime number. In this chap- 
Mwe shall see that neither of these is a regular language. We can describe them in English, 
%ut they cannot be defined by an FA. More powerful machines are needed to define them, 
Liatachines that we build in later chapters. 






DEFINITION 

IA language that cannot be defined by a regular expression is called a nonregular language. 

| 1 ■ 

By Kleene’s theorem, a nonregular language can also not be accepted by any FA or TG. 
;A11 languages are either regular or nonregular; none are both, 
j, Let us first consider a simple case. Let us define the language L. 

BBS r L= (A ab aabb aaabbb aaaabbbb aaaaabbbbb . . .} 

We could also define this language by the formula 

■pT L=[a n b n for n = 0 1 2 3 4 5 . . .} 

R-ibr short 

L={a n b n ) 
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When the range of the abstract exponent n is unspecified, we mean to imply that it is 0, ] ~JS 

3,- ! 

We shall now show that this language is nonregular. Let us note, though, that it is a g »^jj 
set of many regular languages, such as a*b*, which, however, also includes such strings a|| 
aab and bb that { a n b n | does not. 

Let us be very careful to note that { a n b n } is not a regular expression. It involves fl »3| 
symbols { } and n that are not in the alphabet of regular expressions. This is a languajJB 
defining expression that is not regular. Just because this is not a regular expression does n |j§ 
mean that none exists; this we shall now prove. .i|H 

Suppose on the contrary that this language were regular. Then there would have to 
some FA that accepts it. Let us picture one of these FAs (there might be several) in our mind;!! 
This FA might have many states. Let us say that it has 95 states, just for the sake of arg£gjj 
ment. Yet, we know it accepts the word cr^b 96 . The first 96 letters of this input string are afl 
c/’s and they trace a path through this machine. The path cannot visit a new state with eacjl 
input letter read because there are only 95 states. Therefore, at some point the path returh|jjj| 
a state that it has already visited. The first time it was in that state it left by the a-road. Thkff 
second time it is in that state it leaves by the a -road again. Even if it only returns once. \v$;3 
say that the path contains a circuit in it. (A circuit is a loop that can be made of severd! 
edges.) First, the path wanders up to the circuit and then it starts to loop around the circuit,^ 
maybe many times. It cannot leave the circuit until a b is read from the input. Then the path^ 
can take a different turn. In this hypothetical example, the path could make 30 loops arousjjj 
a three-state circuit before the first b is read. 3jj| 

After the first b is read, the path goes off and does some other stuff following &-edafl 
and eventually winds up at a final state where the word a 9b b 96 is accepted. 

Let us, for the sake of argument again, say that the circuit that the a-t dge path loops;! 
around has seven states in it. The path enters the circuit, loops around it madly, and thert|| 


goes off on the fi-line to a final state. What would happen to the input string a 


96+7 L 


Just as in the case of the input string a 9b b % , this string would produce a path through tt§|| 
machine that would walk up to the same circuit (reading only a’s) and begin to loop^ 
around it in exactly the same way. However, the path for a 96+1 b 96 loops around this e llj 
cuit one more time than the path for a 9b b 9b —precisely one extra time. Both paths, at ex-Jg 
actly the same state in the circuit, begin to branch off on the fi-road. Once on the b-romm 
they both go the same 96 fi-steps and arrive at the same final state. But this would mea|j| 
that the input string a m b 96 is accepted by this machine. However, that string is not in tmj 
language L — { a n b n }. ll 

This is a contradiction. We assumed that we were talking about an FA that accepts ex-|] 
actly the words in L and then we were able to prove that the same machine accepts some^| 
word that is not in L. This contradiction means that the machine that accepts exactly ttfgB 
words in L does not exist. In other words, L is nonregular. 

Let us review what happened. We chose a word in L that was so large (had so many lejjj 
ters) that its path through the FA had to contain a circuit. Once we found that some path wijS 
a circuit could reach a final state, we asked ourselves what happens to a path* that is just bkgjj 
the first one, but that loops around the circuit one extra time and then proceeds identical!® 
through the machine. The new path also leads to the same final state, but it is generated by||j 
different input string—an input string not in the language L. 

Perhaps the following picture can be of some help in understanding the idea behind thi|jj 
discussion. Let the path for a 9 b 9 be ’ 
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laaa 



pj|jj| 


mm 


r 33333 


mm 


We have not indicated all the edges in this FA, only those used in the path of the word 
a 9 b 9 . State 6 is the only state for which we see both an a-exit edge and a fi-exit edge. 

In the path this input string takes to acceptance, we find two circuits: the a-circuit 3-4- 
5-6 and the fi-circuit 9-10. Let us concentrate on the a-circuit. What would be the path 
through this FA of the input string a n b 9 l The path for a n b 9 would begin with the same nine 
steps as the path for a 9 b 9 ending after nine steps in state 6. The input string a 9 b 9 now gives us 
a b to read, which makes us go to state 7. However, the path for a l3 b 9 still has four more a- 
steps to take, which is one more time around the circuit, and then it follows the nine fi-steps. 

The path for a l3 b 9 is shown below: 




g||HP|jp 


Let us return to our first consideration. 

With the assumptions we made above (that there were 95 states and that the circuit was 
7 states long), we could also say that a U() b % , a X{1 b 9b , a m b 9b , ... are also accepted by this 
machine. 

They can all be written in this form: 

a 96 (a 7 ) m b 96 

where m is any integer 0, 1, 2, 3, . . . . If m is 0, the path through this machine is the path 
for the word a 9b b 9b . If m is 1, the path looks the same, but it loops the circuit oite more time. 
If m — 2, the path loops the circuit two more times. In general, a 9b {a 1 ) m b 9b loops the circuit 


- _ 
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exactly m more times. After doing this looping, it gets off the circuit at exactly the si 
place a b does and proceeds along exactly the same route to the final state All " 
words, though not in L, must be accepted. 

Suppose that we had considered a different machine to accept the language L perh 
machine that has 732 states. When we input the word a 733 /? 733 , the path that the a 7 s take 
contain a circuit. We choose the word a 133 b 733 to be efficient. The word a m9 b"" also 
loop around a arcuit m its a-part of the path. Suppose the circuit that the a-part follows 
01 states. Then a b would also have to be accepted by this machine, because 
path is the same in every detail except that it loops the circuit one more time. This sec 
machine must also accept some strings that are not in L\ 

a 834 b 733 a 935 b m a l036 b 733 . 

= a 133 {a m ) m b 733 for m = 1 2 3 


HE 

If?' ! 

ft] 


1 




m 


For each different machine we suggest to define L, there is a different counterexami 
proving that it accepts more than just the language L. 

There are machines that include L in the language they accept, but for each of t* 
there are infinitely many extra words they must also accept. 

All in all, we can definitely conclude that there is no FA that accepts all the strines i, 
and only the strings in L. Therefore, L is nonregular. 

The reason why we cannot find an FA that accepts L is not because we are stupid ! 
because none can exist. ’ 

The principle we have been using to discuss the language L above can be generalized 
that it applies to consideration of other languages. It is a tool that enables us to prove ** 
certain other languages are also nonregular. We shall now present the generalization of 
idea caHed the pumping lemma for regular languages, which was discovered by Yehos 
Bar-Hillel, Micha A. Perles, and Eliahu Shamir in 1961. 

Jhe name ^f this theorem is interesting. It is called “pumping” because we pump m 
s uff into the middle of the word, swelling it up without changing the front and the back p* 
of the string. It is called a “lemma” because, although it is a theorem, its main importance ] 
as a tool in proving other results of more direct interest; namely, it will help us prove tht 
certain specific languages are nonregular. r ; 
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THEOREM 13 

Let L be any regular language that has infinitely many words. Then there exist some th 
strings x, y, and z (where y is not the null string) such that all the strings of the form 




xy n z for n = 1 2 3 . . 


are words in L. 


PROOF 


If L is a regular language, then there is an FA that accepts exactly the words in L. Let us fo¬ 
cus on one such machine. Like all FAs, this machine has only finitely many states. But Thai 
infinitely many words in it. This means that there are arbitrarily long words in L (If there 

were some maximum on the length of all the words in L, then L could have only finitely 
many words in total.) 

Let w be some word in L that has more letters in it than there are states in the machine: 
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W e are considering. When this word generates a path through the machine, the path cannot 
visit a new state for each letter because there are more letters than states. Therefore, it must 
at some point revisit a state that it has been to before. Let us break the word w up into three 
parts: 

,< Part 1 Call part x all the letters of w starting at the beginning that lead up to the first 
state that is revisited. Notice that x may be the null string if the path for w revis¬ 
its the start state as its first revisit. 

\ Part 2 Starting at the letter after the substring x, let y denote the substring of w that 
travels around the circuit coming back to the same state the circuit began with. 
Because there must be a circuit, y cannot be the null string, y contains the letters 
of w for exactly one loop around this circuit. 

' Part 3 Let z be the rest of w starting with the letter after the substring y and going to 
the end of the string w. This z could be null. The path for z could also possibly 
loop around the y-circuit or any other. What z does is arbitrary. 

Clearly, from the definition of these three substrings 

w — xyz 

and w is accepted by this machine. 

What is the path through this machine of the input string 

xyyz? 

It follows the path for w in the first part x and leads up to the beginning of the place where w 
looped around a circuit. Then like w, it inputs the string y, which causes the machine to loop 
back to this same state again. Then, again like w, it inputs a string y, which causes the ma¬ 
chine to loop back to this same state yet another time. Then, just like w, it proceeds along the 
path dictated by the input string z and so ends on the same final state that w did. This means 
that xyyz is accepted by this machine, and therefore it must be in the language L. 

If we traced the paths for xyyz, xyyyz, and xyyyyyyyyyyyyz, they would all be the same. 
Proceed up to the circuit. Loop around the circuit some number of times. Then proceed to 
the final state. All these must be accepted by the machine and therefore are all in the lan¬ 
guage L. In fact, L must contain all strings of the form: 

xy n z for n= 1 2 3 . . . 

as the theorem claims. 

Perhaps these pictures can be helpful in understanding the argument above: 



Notice that in this theorem it does not matter whether there is another circuit tfaced in the z- 
part or not. All we need to do is find one circuit, and then we keep pumping it for all it is worth. 
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Notice also that we did not assume that the x-, y-, or 2 -parts were repetitions of the si 
as was the case in our discussion of {a n b n } . They could have been any arbitrary strings; 


EXAMPLE 


Let us illustrate the action of the pumping lemma on a concrete example of a 
guage. The machine below accepts an infinite language and has only six states: 


Any word with six or more letters must correspond to a path that includes a11 
Some words with fewer than six letters correspond to paths with circuits, such as batijl 
word we will consider in detail is 

w — bbbababa 

which has more than six letters and therefore includes a circuit. The path that this wot 
erates through the FA can be decomposed into three stages. 

The first part, the x-part, goes from the - state up to the first circuit. This is oral 
edge and corresponds to the letter b alone. The second stage is the circuit around staifl 
and 5. This corresponds to edges labeled b, b, and a. We therefore say that the substrii] 
is the y-part of the word w. After going around the circuit, the path proceeds to states^ 
and 6. This corresponds to the substring baba of w, which constitutes the z-part: ri 

w — b bba baba 


Now let us ask what would happen to the input string xyyz. 

x y y z~ b bba bba baba 

This is what happens! 
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same thing happens with xyyyz , xyyyyz , and in general for xy n z. This is all that the 
g lemma says. ■ 



"AMPLE 

moose for a moment that we did not already have a discussion of the language 

1 1 L={a*tf for n = 0 1 2 3...} 

iw, how we could apply the pumping lemma directly to this case. 

pumping lemma says that there must be strings x, y, and z such that all words of the 


: ~:vii 


are in L. Is this possible? A typical word of L looks like 


aaaabbbb 






-m 

' 


iftoW <jo W e break this into three pieces conformable to the roles x, y, and 
' t i$ going to be made entirely of a' s, then when we pump it to xyyi 
than b\ which is not allowed in L. Similarly, if the middle part, y, 
the word xyyz will have more fr’s than a' s. The solution is that 
«itive number of a's and some positive number of b" s. This would 
ing ab. Then xyyz would have two copies of the substring ab. But 
the substring ab exactly once. Therefore, xyyz cannot be a word in L 
ing lemma cannot apply to L and therefore L is not regular. 






z? If the middle sec- 
z, the word will have 
, is composed of only 
the y-part must have 
mean that y contains 
every word in L con- 
. This proves that the 


: 

k % 0 

■■turn 




XAMPLE 

S3ace we have shown that the language { a n b n } is nonregular, we can show that the language 
J'dlQUAL, of all words with the same total number of a’s and b’ s, is also nonregular. (Note 
^ihat the numbers of a's and b' s do not have to be even, they just have to be the same.) 

EQUAL = (A ab ba aabb abab abba baab baba bbaa aaabbb. . .} 

language {a n b n \ is the intersection of all words defined by the regular expression a*b* 
;; and the language EQUAL: 


a*b* fl EQUAL 





gNOw if EQUAL were a regular language, then { a n b n } would be the intersection of two regu- 
languages and by Theorem 12 on p. 174 it would have to be regular itself. Because 
is not regular, EQUAL cannot be. ■ 


Ijlllj For the example [a n b n }, and in most common instances, we do not need the full force of 
J§HN pumping lemma as stated. It is often just as decisive to say that w can be decomposed 
tyz, where xyyz is also in the language. The fact that xy n z is in the language for all n > 2 
interesting and will be quite useful when we discuss whether certain languages are fi- 
infinite, but often n = 2 is adequate to show that a given language is nonregular. 


ample 

isider the language a n ba n — {b aba aabaa . . If this language were regular, then 
% would exist three strings x, y, and z such that xyz and xyyz were both words in this lan- 
We can show that this is impossible: 



c 
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EXAMPLE 


I ’ put the end-of-proof symbol ■ right after the statement of the theorem to indicate 
lutf we have already provided a proof of this result. 

The purpose of stressing the question of length is illustrated by our next example. 


are in PALINDROME. 

However, let us consider one of the FAs that might accept this language. Let us say that 
the machine we have in mind has 77 states. Now the palindrome 

w = a*°ba*° 


But the second version of the pumping lemma says that PALINDROME has to include this 
string. Therefore, the second version does not apply to the language PALINDROME, which 
means that PALINDROME is nonregular. 

Obviously, this demonstration did not really rely on the number of states in the hypo¬ 
thetical machine being 77. Some people think that this argument would be more mathemati¬ 
cally sound if we called the number of states m. This is silly. ■ 
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Observation 1: If the y string contained the b, then xyyz would contain two 
no word in this language can have. 

Observation 2: If the y string is all a’ s, then the b in the middle of the word 
the x-side or z-side. In either case, xyyz has increased the number of a' s either 
the b or after the b, but not both. - | 

Conclusion 1: Therefore, xyyz does not have its b in the middle and is not in the 
a n ba\ 

Conclusion 2: This language cannot be pumped and is therefore not regular. 


EXAMPLE 

Consider the language a n b"ab n+[ for n - 1, 2, 3 .... The first two words of this 
language are ahabb and aabbabbb. We are going to show that this language too is hi, 
lar by showing that if xyz is in this language for any three strings x, y, and z, then xyyt 
in this language: 

Observation 1: For every word in this language, if we know the total number of 
can calculate the exact number of b's (twice the total number of a’s - 1). And con¬ 
versely, if we know the total number of b's, we can uniquely calculate the number? 
(add 1 and divide by 2). So, no two different words have the same number of a’so 

Observation 2: All words in this language have exactly two substrings equal to ab 
one equal to ba. '■I ’flU 

Observation 3: If xyz and xyyz are both in this language, then y cannot contain edjf 
the substring ab or the substring ba because then xyyz would have too many. 3 

Conclusion 1: Because y cannot be A, it must be a solid clump of a's or a solid clull 
of b's; any mixture contains the substrings forbidden to it in observation 3. 11® 

Conclusion 2: If y is solid a' s, then xyz and xyyz are different words with the same?! 
b's, violating observation 1. If y is solid b's, then xyz and xyyz are different words: 1 
the same number of <z’s violating observation 1. |§_ 

Conclusion 3. It is impossible for both xyz and xyyz to be in this language for any^jlj 
strings x, y, and z. Therefore, the language is unpumpable and not regular. 

The proof that we gave of the pumping lemma actually proved more than was exjfjfl 
stated in the lemma. By the method of proof that we used, we showed additionally thlj 
string x and the string y together do not have any more letters than the machine in qa 
has states. This is because as we proceed through x and y, we visit our first repeated: stat&'l 
the end of y; before that, all the states were entered only once each. 

The same argument that proved Theorem 13 (see p. 190) proves the stronger tftf 
below. 


THEOREM 14 

Let L be an infinite language accepted by a finite automaton with N states. Then fojl| 
words w in L that have more than N letters, there are strings x, y, and z, where y 
and length(x) + length(y) does not exceed N such that 


w = xyz 


and all strings of the form 


We shall show that the language PALINDROME is nonregular. We cannot use the first ver¬ 
sion of the pumping lemma to do this because the strings 

x = a, y = b, z = a 

fatisfy the lemma and do not contradict the language. All words of the form 

* ' xy n z = ab n a 


‘must be accepted by this machine because it is a palindrome. Because it has more letters 
than the machine has states, we can break w into the three parts: x, y, and z. But because the 
length of x and y must be in total 77 or less, they must both be made of solid a's, because the 
ftfSt 77 letters of w are all a's. That means when we form the word xyyz, we are adding more 
a's to the front of w. But we are not adding more a's to the back of w because all the rear a's 
are in the z-part, which stays fixed at 80 a's. This means that the string xyyz is not a palin¬ 
drome because it will be of the form 


^more than 80^80 


Is PRIME a regular language? If it is, then there is some FA that accepts exactly these 
words. Let us keep one such automaton in mind. Let us suppose, for the sake of argument, 
Ihat it has 345 states. Let us choose a prime number bigger than 345—-for example, 347. 
Then a 347 can be broken into parts x, y, and z such that xy"z is in PRIME for any value of n. 
The parts x, y, and z are all just strings of a's. Let us take the value of n = 348. By the pump- 
ifiig lemma, the word xy 348 z must be in PRIME. Now 


Let us consider the language 


PRIME = [a p where p is a prime} 

= {aa aaa aaaaa aaaaaaa . . .} 
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We can write this because the factors x, y, and 2 are all solid clumps of a’s, and it does 1 
matter in what order we concatenate them. All that matters is how many a 's we end up wi 
Let us write 

Xy Z yW = 0347^347 

This is because x, y, and 2 came originally from breaking up a w into three parts. We al& 
know that y is some (nonempty) string of a ’s. Let us say that y - a m for some integer m tha 
we do not know. 

^ 347^347 = ^ 347^347 
_ ^347 + 347 m 
- a 347(m+l) 

These operations are all standard algebraic manipulations. 

What we have arrived at is that there is an element in PRIME that is of the form a to t 
power 347 (m + 1). Now because 0, we know that 347(m + 1) is not a prime numb| 
But this is a contradiction, because all the strings in PRIME are of the form a p , where th 
exponent is a prime number. This contradiction arose from the assumption that PRIME wf 
a regular language. Therefore, PRIME is nonregular. 


THE MYHILL - NERODE THEOREM 

The pumping lemma is negative in its application. It is used exclusively to show that certa 
languages are not regular because they cannot meet its requirements. We shall now introdu 
another method for saying that a given language might be nonregular but has a constructs 
aspect to it. 

If we consider a particular FA, then each state, whether a final state or not, can t 
thought of as creating a society of a certain class of strings. Here, we are talking abou 
strings, not only accepted words. Two strings can be said to both belong to the society.pf 
state x 4 if they both trace a path from start Iqa 4 even if the paths are. very different. Similarly, 
defines,society. Because every one of the infinitely many possible input strings 
ends up at one of the finitely many states, some of these societies have infinite membership. 

If string x and string y are in the same society, then for all other strings 2 , either both xi 
and yz are accepted by the machine or both are rejected. This simply depends on whether the 
string 2 traces a path from the mutual state of x and y to a final state. 

Now let us consider this from the aspect of a regular language without reference to any 
one of the many FAs that recognize it. 


m 

■Is 


THEOREM 15 

? p 

Given a language L, we shall say that any two strings x and y are in the same class if for all 
possible strings z either both xz and yz are in L or both are not. 

1. The language L divides the set of all possible strings into separate (mutually exclusive! 
classes. 

2. If L is regular, the number of classes L creates is finite. 

3. If the number of classes L creates is finite, then L is regular. 
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PROOF 

What needs to be pmyenJn.EarIJ is that the description we gave of dividing into classes is 
not self-contradicting. An example of a bad way of dividing into classes is this: Say any two 
students at college are in the same class if they have taken a course together. A and B may 
have taken history together, B and C may have taken geography together, but A and C never 
took a class together. Then A, B, and C are not all in the same class. This cannot happen ac¬ 
cording to our definition of classes. If both AZ and BZ are always in L or not and if both BZ 
and CZ are always in L or not, then A, B, and C must all be in the same class. If 5 is in a 
class with X and S is also in a class with Y, then by the reasoning above X and Y must be in 
the same class. Therefore, 5 cannot be in two different classes. No string is in two different 
classes and by definition every string is in some class. Therefore, every string is,in exactly 
oneclass-. 

Taprove~Eart^, we know that because L is regular, there is some FA that accepts L, and 
its finitely many states create a finite division of all strings into finitely many societies as de¬ 
scribed above. We still use the word society instead of classes since these societies are not 
actually identical to what we have defined as classes in the theorem. The problem is that two 
different states might define societies that are actually the same class. In the example below: 


6 



both states 1 and 2 have the property that any word in them when followed by string 2 will 
be accepted if 2 contains an a and rejected otherwise. These two societies are in the same 
class. It is true that the societies defined by the states in this machine are either separate 
classes in th e sense of this theorem or can be grouped to form classes. In either case, the 
n umber of classes is n ot more than the number of societies and that is finite. 

It should come as no surprise to us that the number ot classes was not exactly the num¬ 
ber of societies because the number of classes language L creates is dependent on L alone, 
whereas the number of societies depends on which FA we choose to recognize L. 

We are going to pgacs PactA hy what appears to be a constructive algorithm, but in fact 
it is not. This is because we will turn the set of finitely many classes that L creates into an 
FA, with each state representing one class. However, to be truly constructive, we have to 
know how to go from “L creates finitely many classes” to “these are the classes.” This we 
have no idea how to do. What we will do is go from “these are the classes to here is the 
FA.” 

Let the finitely many classes be C v C 2 , . . . , where C j is the class containing A. We 
will turn this collection of classes into an FA by showing how to draw the edges between 
them and how to assign start and final states. 

The start state must be C, because A begins and ends in the start state. Now we make 
another observation: If a class contains one word of L, then all the strings in the class are 
words in L. To prove this, let w be in class C 7 and a word in L, and let .9 be any other string in 
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the class. Then letting z = A, we know that both wA and sA are either in L or not. Becaus 
wA is in L, then so is sA = s. Therefore, some of the classes are completely contained in 
and some have no L words. Label all those that are subsets of L with + ’s. We should als 
note that all words in L are in the final states. 

If a* and y are two strings in class C 4 , say, then by definition for all strings z, both xz an 
yz are in L or not. Also, both xa and ya must be in the same class because for all strings 
both xaz and yaz must be in L or not because az can be considered a tail added to x and y i 
class C 4 . If we take every string in C 4 and add an a on the right, the resultant strings woul 
therefore all be in the same class. Draw an a-edge from C 4 to this class. Similarly, draw a 
the a-edges and all the 6-edges. 

There is no guarantee that the picture which results is connected or has only enterable 
states, but it is an FA. Also, any string that can trace a path from the start to a final state mu$ 
be in L and every string in L must end in a final state. Therefore, if a language creates a finite 
set of classes by the definition of the theorem, it is a regular language. 


Myhill we have met before; Anil Nerode published this theorem in 1958. 

First, we shall illustrate Part 3 with some examples. There are not many languages L for 
which we know what classes they create, but there are some. 

EXAMPLE 

Let us consider the language of all words that end in a. At first, it may seem that there is 
only one class here because for all a and y, both xz and yz end in a or not, depending on z 
alone. But this overlooks the fact that if z is A, then xz and yz are in the same class, depend 
ing on whether x and y end in a themselves. There are therefore two classes; 

C, = all strings that end in a, a final state 
C 2 = all strings that do not, the start state 

The FA is 



as we have seen before. 


EXAMPLE 


Let L be the language of all strings that contain a double a. There are three classes: 

C, = strings without aa that end in a 
C 2 = strings without aa that end in b or A 
C 3 = strings with aa, the final state 

States 1 and 2 are different because adding an a to any string in C, puts it in L, but it will not do 
the same for a string in C 2 . Also, C 3 is different because adding z = A to the strings in C 3 will 
put them in L, while it will not for strings in C x or C 2 . As we have seen before, the machine is 
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EXAMPLE 

Working the algorithms of Theorem 15 (see p. 96) on the language EVEN-EVEN creates 
four obvious states: 

C, = EVEN-EVEN 
C 2 = even a’ s, odd 6’s 
C 3 = odd a’ s, even 6’s 
C 4 = odda’s, odd 6’s 

Clearly, if a and y are in any one class, then both az and yz are in L or not, depending on how 
many a's and 6’s z alone has. The FA is exactly the same as we have had before. ■ 

For the purpose of this chapter, it was actually Part 2 that we were the most interested 
in, because it offers us a technique, different from the pumping lemma, for proving that cer¬ 
tain languages are nonregular. If wj; can show .that a given language L creates infinitely many 
classes, then we know L is nonregular. 

EXAMPLE 

To show that the language a n b n is nonregular, we need only observe that the strings a, aa, 
aaa,aaaa, . . . are all in different classes because for each m, only a m is turned into a word 
in L by z = b m . * 

EXAMPLE 

To show that a n ba n is nonregular, we note that the strings ah, aab , aaab, ... are all in dif¬ 
ferent classes because for each of them, one value of z = a m will produce a word in L and 
leave the others out of L. * 

EXAMPLE 

EQUAL is nonregular because, for each of the strings a, aa, aaa, aaaa , . . . , some value 
of z = b m will put it alone in EQUAL. * 

EXAMPLE 


PALINDROME is nonregular because ab, aab, aaab, ... are all in different classes. For 
each of these, one value of z = a m will create a PALINDROME when added to it but to no 
other. ® 






CHAPTER 10 Nonregular Languages 


EXAMPLE 


Let us define the language DOUBLEWORD to be the collection of all words that are of tfi 
form SS, where S is any string of a ’s and b' s. DOUBLEWORD starts out with these words: 
aa bb aabb abab baba bbbb aaaaaa .... Let us use Theorem 15 to prove that 
language DOUBLEWORD is nonregular. It is not so obvious when two strings are in diffe 
classes since strings can turn into doublewords in various ways. For example, x~ bb 
y - bbbb can each be turned into words in DOUBLEWORD using z~x = bb. However, 
following infinite set of strings is easy to show as belonging to different classes: ab aab 
aaab aaaab .... For any two strings x and y we choose from the set above, we let; 
and find that xz is in DOUBLEWORD but not yz. Therefore, DOUBLEWORD creates infinitel 
many classes (at least one for each string above and maybe more) and is therefore nonregular. 




QUOTIENT LANGUAGES 


Now that we have proven there are such things as nonregular languages, we have more re¬ 
spect for the theorem stating that the product of any two regular languages is always regul 
We are also ready to approach the question of whether there is a corresponding division the 
orem; that is, can we prove that the quotient of two regular languages is regular? 

There is a problem here regarding what it means to say that the language Q is the quq| 
tient of the two regular languages P and R. If we write 


Q = RIP 


whenever it is true that 


PQ = R 

then, in some cases, the symbol RIP does not determine a unique language. For example, 1 
P, Q , and R are all the language a*, then it is true that 

PQ = R 

so therefore we may write 

a*= a*/a* 

On the other hand, if P and R are both the language a*, while Q is the language of the one 
word {A}, then PQ — R is still true, which means we also have to write 


{A} = aVa* 


Similarly, we can show that 


{A a aaaa aaaaaaaa} = a*/a* 

There are infinitely many choices for the meaning of R/Q even in this simple case of the one 
letter alphabet. 

What happens if we do not use the division symbol itself as an operation to produce | 
unique language, but instead attempt to get around the ambiguity by proving that all these lan 
guages that could be interpreted as R/Q are regular? We could then make the following claim. 



PSEUDOTHEOREM 




Bf 


If for three languages P, Q, and R we have 
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PQ=R 

and P and R are regular, then Q must also be regular. 

The reason that we have called this a pseudotheorem is that it is not true. 

DISPROOF 

Let us assume, for a moment, that this claim is true. Now let P be the language defined by 
the regular expression a* and let Q be the product of { a n b" } and b* where we let n start from 
0, which will allow the word A in the language. Now let R be the language defined by a*b*. 
In this case, it is true that 

PQ = a*[{a“b n }b*] 

= [a*b *=R 

Because both P and R are regular, if the preceding claim is true, then Q must be regular. Now 
all we have to do to disprove the claim is show that this Q is not regular. This is not hard to do. 

The language Q is the set of all strings of the form cfb? where x < y. If Q were regular, 
it could be accepted by a certain FA with some fixed number of states; let us call it N. The 
word a N bP is accepted by this machine in a path that contains a loop of solid a' s. Cycling 
around this loop one extra time will create a path through the machine that leads to accep¬ 
tance and corresponds to a word with more than N a 's and only N b’s. This word should not 
be in Q; therefore, no FA that can be imagined can accept exactly the language Q. So, Q is 
not regular, and the claim in the pseudotheorem is false. 

Quod Erat Demolition 

We do not need to abandon all hope of finding a result similar to a division theorem if 
we concentrate on the P factor and not the Q factor in the product. Let us imagine that we 
have a regula r language R and some of its wordsgjdjnastririg that is a wordin the l ap- 
jypg^If we focus our attention only_onrthe se words of R (the ones that end in a Q- word) 
and we define the language / 3 to be the s et of front-halve s of these woods, we can indeed 
prove that P is regular. Let us call these front-halves the prefixes that can be attached to some 
words in Q to obtain some words in R. 

Let us state this cautiously. 

DEFINITION 

If R and Q are languages, then the language “the prefixes of Q in Rp denoted by the symbolism 

Pref«2 in R) 

is the set of all strings of letters that can be concatenated to the front of some word in Q to 
.produce some word & 

We may write this as 

Pref((2 i n jP = the set of all strings p such that there exist words 

a in Q and w in R such that pq — w H 

EXAMPLE 

If Q is the language 

[aa abaaabb bbaaaaa hbhbbbbbhb } 
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and R is the language 


[b bbbb bbbacia bbbaaagg) 


then the language of the prefixes of Q in R is !ll 

Pref (Q in R) — [b bbba bbbaaa } 

because the first word in Q can be made into a word in R in two ways and the third word M§ 
Q can be made into a word in R in one way, whereas the other words in Q cannot be made I 
into words in R by the addition of any possible prefixes. Ijjj 

We should note that A is only a word in the prefix language if Q and R have some words H 
in common. It is also possible that no word of Q can be made into a word of R by the addi-1 
tion of a prefix. In this case, we say that the prefix language is empty, Pref(<2 in R) = <}>. 

EXAMPLE 

If Q — ab*a and R = (ba)*, then the only word in Q that can be made into a word in R is 
aba because no word in R has a double letter and all other words in Q have. Also, aba can be 5 
made into a word in R by prefixing it with any word of the form (ba)*b. Therefore, 

PrefTab*a in (ba)*] = (ba)*b ij| 

We can now prove a version of a division theorem that is at the same time less and more j 
ambitious than we originally intended. It is disappointing in the sense that this prefix lan¬ 
guage does not actually give us a factorization of the language R into P times Q. In general, 

Pref(0 in R)Q ¥> R 

because many words of R may not be formed from words in Q by the addition of prefixes, 
and many words in Q may have nothing whatsoever to do with being parts of words in R. On 
the other hand, what we can show is that the prefix language is regular whenever R is regular 
even if Q is not regular ; 

THEOREM 16 

If R is a regular language and Q is any language whatsoever, then the language 

P = Pref((2 in/e) 

is regular. 

PROOF 

Because R is a regular language, let us fix in our minds some FA that accepts! R. This ma¬ 
chine has one start state and possibly several final states. Now let s be any state in this ma¬ 
chine (possibly the start or final state). Let us now process all the words from the language Q 
on this machine beginning in state s as if it actually were the start state. Either some word 
(or words) from the language Q will lead to a final state when traced through the FA or else 
no words from Q will end up in a final state. If any word in Q can begin in s and trace to a fi¬ 
nal state, paint the state s blue. 

Let us make the same determination for all the states in the FA. If they end up blue, then 
some word from Q can start there and proceed to a final state. If they are not blue, then no 
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word from Q can start there and go to a final state. What results is an FA with one start state 
and some or no blue states. 

Let us now build a new machine from the one with which we started. Let this new ma¬ 
chine have exactly the same states and edges as the original FA that accepts R . Let this 
new FA have the same state labeled start as in the original FA, but let the final states be all 
the blue states of the old FA and only those, no matter what their final status was in the 
original machine. We shall now show that the new FA accepts exactly the language 
P = Pref(<2 in R). 

To prove this, we have to observe two things: (1) Every word in P is accepted by this 
machine. (2) Every word accepted by this machine is in the language P. 

If xv is any word accepted by this machine, then when we trace its processing, beginning 
at the start state, the path of w will end in a final state, which on the original FA corresponds 
to a state painted blue. This state is blue because some word from Q (call it q) can start there 
and run to what was the final state on the original FA. This means that if the string xvq was 
run on the original FA, it would be accepted, which in turn means that xvq is in R and xv is in 
P. So, we have shown that every word accepted by the machine is in P. 

We now have to show that every word in P is, in fact, accepted by this machine. Let p be 
any word in P. Then by the definition there is a word q in Q and a word xv in /?, such that 
pq ~ xv. This means that the string pq when run on the original FA leads from start to a final 
state. Let us trace this path and note where the processing of the p-part ends and the process¬ 
ing of the g-part begins. This will be at a state from which q runs to a final state, and it is 
therefore blue. This means that on the original machine the p-part traces from start to blue. 
Therefore, on the new FA the p-part traces from start to a final state. Thus, p is accepted by 
the new FA. 

The language of this new machine is P, the whole P, and nothing but the P. Therefore, P 
is regular. ■ 

We should take particular note of the fact that although this proof looks like a proof by 
constructive algorithm, it is not that at all. We glibly tossed in the phrase “process all the 
words from the language Q on this machine starting in state s . . . .” This i s not easy to do 
if Q is an infiniteTaag uage . This is indeed a weakness in practical terms, but it is not a flaw 
that invalidates the proof. It is still very much true that for each state s, either there is some 
word in Q that runs from there to a final state or else there is not. Therefore, every state of 
the machine is either definitely blue or definitely not blue. The trouble is that we have not 
provided a constructive method for deciding which. What we have proven is that there exists 
an FA that accepts the language Pref(<2 in R) without having shown how to build one. This 
method of proof is called a nonconstructive existence proof, and as such, it is just like the 
proof of Part 3 of the Myhill-Nerode theorem. 

PROBLEMS 

1. Use the pumping lemma to show that each of these languages is nonregular: 

(i) [a n b n+l ] = [abb aabbb aaabbbb . . .} 

(ii) [a n b n a n ] = [aba aabbaa aaabbbaaa aaaabbbbaaaa . . .} 

(iii) [a n b 2n \ = [abb aabbbb aaabbbbbb . . .} 

(iv) [a n ba n ) = [aba aabaa aaabaaa . . .} 

(v) [a n b n a m where n = 0, 1, 2, . . . and m = 0, 1, 2, . . .} = {A a aa ab aaa 
aba . . .} 

2. Prove that the five languages in Problem 1 are nonregular using the Myhill-Nerode 

theorem. 
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3. Use the pumping lemma to prove that the language DOUBLEWORD from p. 200 
nonregular. 

4. Define the language TRAILING-COUNT as any string s followed by a number of 
equal to length(s). 

TRAILING-COUNT = {aa ba aaaa abaa baaa bbaa aaaaaa aabaaa 
abaaaa . . .} 

Prove that this language is nonregular by the 

(i) Pumping lemma. 

(ii) Myhill-Nerode theorem. 

5. Define the languages 

EVENPALINDROME = {all words in PALINDROME that have even length} 

= {aa bb aaaa abba baab bbbb . . .} 

ODDPALINDROME = {all words in PALINDROME that have odd length} 

(i) Show that each is nonregular by the pumping lemma. 

(ii) Show that each is nonregular by the Myhill-Nerode theorem. 

6. Define the language SQUARE as follows: 

SQUARE = { a n where n is a square} 

= {a aaaa aaaaaaaaa . . .} 

This language could also be written as {a n2 }. 

(i) Use the pumping lemma to prove that SQUARE is nonregular. 

(ii) Use the Myhill-Nerode theorem to prove that SQUARE is nonregular. 

7. Define the language DOUBLESQUARE as follows: 

DOUBLESQUARE = \a n b n where n is a square} 

= \ab aaaabbbb aaaaaaaaabbbbbbbbb . . 

Prove that DOUBLESQUARE is nonregular by the 

(i) Pumping lemma. 

(ii) Myhill-Nerode theorem. 

8. Define the language DOUBLEPRIME as follows: 

DOUBLEPRIME = {a p b p where p is any prime} 

= {aabb aaabbb aaaaabbbbb . . .} 

Prove that DOUBLEPRIME is nonregular by the 

(i) Pumping lemma. 

(ii) Myhill-Nerode theorem. 

9. Define the language DOUBLEFACTORIAL as follows: 

DOUBLEFACTORIAL = \a n[ b n[ ) 

= [ab aabb aaaaaabbbbbb . , 

Prove that DOUBLEFACTORIAL is nonregular by the 

(i) Pumping lemma. 

(ii) Myhill-Nerode theorem. 
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10. Just for this problem, let the alphabet be X = { a b c }. Let us consider the language 

a n b n c n = {abc aabbcc aaabbbccc , . .} 

Prove that this language is nonregular by the 

(i) Pumping lemma. 

(ii) Myhill-Nerode theorem. 

11. Let us revisit the language DOUBLEWORD from p. 200. Use the Myhill-Nerode theo¬ 
rem to show that this language is nonregular by showing that all the strings in a* are in 
different classes. 

12. Let us consider the language of algebraic expression, ALEX, defined by the recursive 
definition on p. 29. We never attempted to give a regular expression for this language 
because it is nonregular. Prove this using the Myhill-Nerode theorem and the sequence 

(A ((A (((A. . . 

13. Define the language MOREA as follows: 

MOREA = (all strings of a’s and b's in which the total number of a’s is greater than the 
total number of b’s] 

= {a aa aab aba baa aaab aaba . . .} 

(i) Use the fact that 

MOREA' H MOREB' D (a + b)* = EQUAL 

to prove that MOREA is nonregular (where MOREB has its obvious meaning). 

(ii) Explain why the pumping lemma cannot be used to prove that MOREA is nonregu¬ 
lar. 

(iii) Show that MOREA can be shown to be nonregular by the Myhill-Nerode theorem 
by using the sequence 

aab aaab aaaab aaaaab . . . 

14. Let L v L v L v . . . be an infinite sequence of regular languages. 

(i) Let L be the infinite union of all these languages taken together. Is L necessarily 
regular? 

(ii) Is the infinite intersection of all these languages necessarily regular? 

15. (i) Give an example of a regular language R and a nonregular language N such that 

R + N is regular. 

(ii) Give an example of a regular language R and a nonregular language N such that 
R + N is nonregular. 

16. Consider the following language: 

PRIME' = { a n where n is not a prime} 

= | A a aaaa aaaaaa aaaaaaaa . . .} 

(i) Prove that PRIME' is nonregular. 

(ii) Prove, however, that PRIME' does satisfy the pumping lemma. 

(iii) How can this be? 

17. (i) Show that if we add a finite set of words to a regular language, the result is a regu¬ 

lar language. 
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(ii) Show that if we subtract a finite set of words from a regular language, the result 
regular language. 

(iii) Show that if we add a finite set of words to a nonregular language, the result 
nonregular language. 

(iv) Show that if we subtract a finite set of words from a nonregular language, the r 
is a nonregular language. 

18. The proof of Theorem 16 used FAs to show that the language PIQ is regular. Show 
the language PIQ is regular using the Myhill-Nerode theorem instead. 

19. Let us define the language PARENTHESES to be the set of all algebraic expressi 
from which everything but the parentheses have been deleted. For example, the ex 
sion (3 + (4*7) + (8 + 9)) + (2 + 1) becomes the word (()())(). 

PARENTHESES = {A () (()) ()() ((())) (())() ()(()) ()()()...} 

(i) Show that this language is nonregular using the Myhill-Nerode theorem. 

(ii) Show that the pumping lemma cannot be successful in proving that this language 
nonregular. 

(iii) If we convert the character “(” into the letter a and the character “)” into the le 
b, show that PARENTHESES becomes a subset of the language EQUAL in w 
each word has the property that when read from left to right, there are never m 
b 's than cC s. 


HUM 
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20. Consider what happens when an FA is built for an infinite language over the one-le 
alphabet 2 = {a}. When the input is a string of a’s that is longer than the number 
states, the path it traces must take the form of some initial sequence of edges follow 
by a circuit. Because all the words in the language accepted by the machine are strin 
of a’ s, all the long words accepted by this FA follow the same path up to the circuit 
then around and around as in the picture below: 

G^CH-O - 0- JL ~Q 



Some of the states leading up to the circuit may be final states and some of 
states in the circuit may be final states. This means that by placing + signs judicious 
along a long path to the circuit, we can make the machine accept any finite set of wo 
S,. While going around the circuit the first time, the FA can accept another finite set 
words S r If the length of the circuit is n, all words of the form a n times a word in S 2 w 
also be accepted on the second go-round of the circuit. 

(i) Prove that if L is any regular language over the alphabet 2 = {a }, then there are t 
finite sets of words Sj and S 2 and an integer n such that 

L = Sj + S 2 (a n )* 

(ii) Consider the language L defined as 

L = \a n where n is any integer with an even number of digits in base 10} 

= {A a 10 a u a n . . .} 




Prove that L is nonregular. 



CHAPTER 11 


Decidability 


EQUIVALENCE 

In this part of the book, we have laid the foundations for the theory of finite automata. The pic¬ 
tures and tables that we have called “machines” can actually be built out of electronic compo¬ 
nents and operate exactly as we have described. Certain parts of a computer and certain aspects 
of a computer obey the rules we have made up for FAs. We have not yet arrived, though, at a 
mathematical model for a whole computer. That we shall present in Part III. But before we 
leave this topic, we have some unfinished business to clear up. Along the way, we asked some 
very basic questions that we deferred considering. We now face three of these issues: 

1. How can we tell whether two regular expressions define the same language? 

2. How can we tell whether two FAs accept the same language? 

3. How can we tell whether the language defined by an FA has finitely many or infinitely 
many words in it, or any words at all, for that matter? 

In mathematical logic, we say that a problem is effectively solvable if there is an algo¬ 
rithm that provides the answer in a finite number of steps, no matter what the particular in¬ 
puts are. The maximum number of steps the algorithm will take must be predictable before 
we begin to execute the procedure. For example, if the problem was, “What is the solution to 
a quadratic equation?”, then the quadratic formula provides an algorithm for calculating the 
answer in a predetermined number of arithmetic operations: four multiplications, two sub¬ 
tractions, one square root, and one division. The number of steps in the algorithm is never 
greater than this no matter what the particular coefficients of the polynomial are. Other sug¬ 
gestions for solving a quadratic equation (such as “keep guessing until you find a number 
that satisfies the equation”) that do not guarantee to work in a fixed number of steps are not 
considered effective solutions, nor are methods that do not work in all cases (such as “try 
x = 2, it couldn’t hurt”). 


DEFINITION 

An effective solution to a problem that has a yes or no answer is called a decision proce¬ 
dure. A problem that has a decision procedure is called decidable. ■ 


The first thing we want to decide is whether two regular expressions determine the exact 
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(ii) Show that if we subtract a finite set of words from a regular language, the result 
regular language. 

(lii) Show that if we add a finite set of words to a nonregular language, the result i 
nonregular language. - 

(iv) Show that if we subtract a finite set of words from a nonregular language, the resii 
is a nonregular language. 

18. The proof of Theorem 16 used FAs to show that the language P1Q is regular. Show th 
the language PIQ is regular using the Myhill-Nerode theorem instead. 

19. Let us define the language PARENTHESES to be the set of all algebraic expre ss i~ 
from which everything but the parentheses have been deleted. For example the exn 
sion (3 + (4*7) + (8 + 9)) + (2 + 1) becomes the word (()())(). 

PARENTHESES = {A () (()) ()() ((())) (())() ()(()) ()()()... | 
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(i) Show that this language is nonregular using the Myhill-Nerode theorem. 

(ii) Show that the pumping lemma cannot be successful in proving that this language 

nonregular. 5 

(iii) If we con vert the character "(” into the letter a and the character “)” into the leti 
b > show th »t PARENTHESES becomes a subset of the language EQUAL in whi 
each word has the property that when read from left to right, there are never mo 
b’s than a’s. 
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20. Consider what happens when an FA is built for an infinite language over the one-lett 
alphabet X = {a}. When the input is a string of a 's that is longer than the number - 
states, the path it traces must take the form of some initial sequence of edges followi 
by a circuit. Because all the words in the language accepted by the machine are strin, 
of a s, all the long words accepted by this FA follow the same path up to the circuit an 
then around and around as in the picture below: 
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Some of the states leading up to the circuit may be final states and some of the 
states in the circuit may be final states. This means that by placing + signs judiciously 
along a long path to the circuit, we can make the machine accept any finite set of words 
S r While going around the circuit the first time, the FA can accept another finite set of 
words S 2 . If the length of the circuit is n, all words of the form a n times a word in S, will 
also be accepted on the second go-round of the circuit. 

(0 Prove that if L is any regular language over the alphabet X = [a], then there are two 
finite sets of words 5, and S 2 and an integer n such that 

(ii) Consider the language L defined as ^ 


L~ [a n where n is any integer with an even number of digits in base 10} 


Prove that L is nonregular. 
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1. How can we tell whether two regular expressions define the same language? 

2. How can we tell whether two FAs accept the same language? 

3. How can we tell whether the language defined by an FA has finitely many or infinitely 
many words in it, or any words at all, for that matter? 

In mathematical logic, we say that a problem is effectively solvable if there is an algo¬ 
rithm that provides the answer in a finite number of steps, no matter what the particular in¬ 
puts are. The maximum number of steps the algorithm will take must be predictable before 
we begin to execute the procedure. For example, if the problem was, “What is the solution to 
a quadratic equation?”, then the quadratic formula provides an algorithm for calculating the 
answer in a predetermined number of arithmetic operations: four multiplications, two sub¬ 
tractions, one square root, and one division. The number of steps in the algorithm is never 
greater than this no matter what the particular coefficients of the polynomial are. Other sug¬ 
gestions for solving a quadratic equation (such as “keep guessing until you find a number 
that satisfies the equation”) that do not guarantee to work in a fixed number of steps are not 
considered effective solutions, nor are methods that do not work in all cases (such as “try 
x = 2, it couldn’t hurt”). 

DEFINITION 

An effective solution to a problem that has a yes or no answer is called a decision proce¬ 
dure. A problem that has a decision procedure is called decidable. ■ 

The first thing we want to decide is whether two regular expressions determine the exact 
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same language. We might, very simply, use the two expressions to generate many words 
from each language until we find one that obviously is not in the language of the other. To be^ 
even more organized, we may generate the words in size order, smallest first. In practice, this 
method works fairly well, but there is no mathematical guarantee that we find such an obvi-; 
ous benchmark word at any time in the next six years. Suppose we begin with the two ex¬ 
pressions -aj 

a(a + b)* and (b + A)(baa 4- ba*)* 

It is obvious that all the words in the language represented by the first expression begin with 
the letter a and all the words in the language represented by the second expression begin 
with the letter b. These expressions have no word in common; this fact is very clear. How¬ 
ever, consider these two expressions: 

(aa + ab + ba + bb)* and ((ba + ab)*(aa + bb)*)* 

Both define the language of all strings over 1 = {a h } with an even number of letters. If 
we did not recognize this, how could we decide the question of whether they are equivalent? 
We could generate many examples of words from the languages each represents, but we 
would not find a difference. Could we then conclude that they are equivalent? It is logically:1 
possible that the smallest example of a word that is in one language but not in the other has 
96 letters. Maybe the smallest example has 2 million letters. Generating words and praying 
for inspiration is not an effective procedure, and it does not decide the problem. 

The following two expressions are even less clear: 

((b*a)*ab*)* and A + a(a + b)* + (a + b)*aa(a + b)* 

They both define the language of all words that either start with an a or else have a double a 
in them somewhere or else are null. The suggestion that we should “interpret what the regu¬ 
lar expressions mean and see whether or not they are the same” is, of course, hopeless. 

Before we answer the first major question of this chapter, let us note that it is virtually 
the same as the second question. If we had a decision procedure to determine whether two 
regular expressions were equivalent, we could use it to determine whether two FAs were 
equivalent. First, we would convert the FAs into regular expressions and then decide about 
the regular expressions. The process of converting FAs into regular expressions is an effec¬ 
tive procedure that we developed in the proof of Kleene’s theorem in Chapter 7. The number 
of steps required can be predicted in advance based on the size of the machine to be con¬ 
verted. Since the conversion process eliminates at least one state with each step, a machine 
with 15 states will take at most 16 steps to convert into a regular expression (counting the 
step that creates a unique — and a unique +). 

Similarly, if we had an effective procedure to determine whether two FAs were equiva¬ 
lent, we could use it to decide the problem for regular expressions by converting them into 
FAs. 

Fortunately, we have already developed all the algorithms necessary to decide the 
“equivalency problem” for FAs and thereby regular expressions. We need only recognize 
how to apply them. 

Given two languages L, and L 2 defined by either regular expressions or FAs, we have 
developed (in Chapter 9) the procedures necessary to produce finite automata for the lan¬ 
guages L,\ L 2 \ Lj fl L 2 \ and L 2 Pi L { ' . Therefore, we can produce an FA that accepts the 
language 

(L, n l 2 ') + (l 2 n L { ') 

This machine accepts the language of all words that are in L, but not L,,, or else in but not 
L y If L, and L 2 are the same language, this machine cannot accept any words. If this ma- 
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chine accepts even one word, then L, is not equal to L v even if the one word is the null 
word. If L, is equal to L 2 , then the machine for the preceding language accepts nothing at all. 

To make this discussion into an effective decision procedure, we must show that we can 
tell by some algorithm when an FA accepts no words at all. This is not a very hard task, and 
there are several good ways to do it. We make a big fuss about this because it is so simple 
that it might seem unimportant, which is wrong. It is a basic question in its own right—not 
just as part of the decidability of the equivalence of regular languages. 

The following subsections outline how to determine whether an FA accepts any words. 

Method I 

Convert the FA into a regular expression. Every regular expression defines some words. We 
can prove this by an algorithm. First, delete all stars. Then for each -I- we throw away the 
right half of the sum and the + sign itself. When we have no more *’s or + 's, we remove the 
parentheses and we have a concatenation of a' s, b’ s, and A’s. These taken together form a 
word. For example, 

(a 4- A)(ab* 4- ba*)*(A 4- b*)* 
becomes (after we remove *’s) 

(a + A)(ab + ba)(A + b) 
which becomes (after we throw away right halves) 

(a)(ab)(A) 

which becomes (after we eliminate parentheses) 

a ab A 

which is the word 

aah 

This word must be in the language of the regular expression because the operations of 
choosing * to be power 1 and 4- to be the left half are both legal choices for forming words. 
If every regular expression defines at least one word, it seems at first glance that this means 
that every FA must accept at least one word. How then could we ever show that two lan¬ 
guages are equal? If we first build an FA for the language 

(L, n l 2 ) + (l 2 n Ly) 

and then convert this machine into a regular expression, is it not true that, by the argument above 
we must find some word in the language of the regular expression, and therefore L l # L 2 no 
matter what they are? No. The hole in this reasoning is that the process of converting this FA 
into a regular expression breaks down. We come down to the last step where we usually have 
several edges running from — to 4- that we add together to form the regular expression 


r i 



r 73 


However, when we get to this last step, we suddenly realize that there are no paths from 
— to + at all. 




CHAPTER 11 Decidability 


This could happen theoretically in three different ways: The machine has no final states 
such as this one: 


or the final state is disconnected from the start state, as with this one: 


or the final state is unreachable from the start state, as with this one: 



We shall see later in this chapter which of these situations does arise if the languages are 
actually equal. 

Method 2 

Examine the FA to see whether or not there is any path from — to +. If there is any path, then 
the machine must accept some words—for one, the word that is the concatenation of the la¬ 
bels of the edges in the path from - to + just discovered. In a large FA with thousands of 
states and millions of directed edges, it may be impossible to decide whether there is a path 
from — to + without the guidance of an effective procedure. One such procedure is this: 

Step 1 Paint the start state blue. 

Step 2 From every blue state, follow each edge that leads out of it and paint the desti¬ 
nation state blue, then delete this edge from the machine. 

Step 3 Repeat step 2 until no new state is painted blue, then stop. 

Step 4 When the procedure has stopped, if any of the final states are painted blue, then 
the machine accepts some words and, if not, it does not. 

Let us look at this procedure at work on the machine: 




after step 1 
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after step 2: 



\ b 


after step 2 again: 


b 



a 


after step 2 again: 


/) 



a 


No new states were painted blue this time, so the procedure stops and we examine the + 
state. The + state is not blue, so the machine accepts no words. 

While we were examining the second method, we might have noticed that step 2 cannot 
be repeated more times than there are total states in the machine. If the machine has N states, 
after N iterations of step 2 either they are all colored blue or we have already stopped. We 
can summarize this as a theorem. 

THEOREM 17 

Let F be an FA with N states. Then if F accepts any words at all, it accepts some word with 
N or fewer letters. 

PROOF 

The shortest path from — to + (if there is any) cannot contain a circuit because if we go 
from — to state 7 and then around a circuit back to state 7 and then to +, it would have been 
shorter to go from - to state 7 to + directly. If there is a path from — to + without a circuit, 
then it can visit each state at most One time. The path can then have at most N edges and the 
word that generates it can have at most N letters. ■ 

The proof actually shows that the shortest word must have at most N — l letters, be¬ 
cause if the start state is a final state, then the word A is accepted and with N — 1 letters we 
can visit the other N — i states. The FA below has four states, but it accepts no word with 
fewer than three letters, so we see that the bound N — 1 is the best possible: 
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' j 


This gives us a third method for determining whether an FA accepts any words. 

Method 3 

Test all words with fewer than N letters by running them on the FA. If the FA accepts none S 
of them, then it accepts no words at all. There are a predictable number of words to test, and ! 
each word takes a finite predictable time to run, so this is an effective decision procedure. j 

These methods are all effective; the question of which is more efficient is a whole other I 
issue, one that we do not (often) raise in this book. As soon as we know that there is at least j 
one way to accomplish a certain task, we lose interest because our ultimate concern is thel 
question, “What can be done and what cannot?” The only motivation we have for investigat¬ 
ing alternative methods is that maybe they can be generalized to apply to new problems thjfgf 
our first approach could not be extended to cover. 


v~*. 
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EXAMPLE 


Let us illustrate the effective decision procedure described above that determines whether i 
two regular expressions are equivalent. We shall laboriously execute the entire process on a«f 
very simple example. Let the two regular expressions be JJJ 


Luckily, in this case we can understand that these two define the same language. Let us see " 
how the decision procedure proves this. Some machines for FA V FA/, FA 2 , and FA 2 ' are|[ 
shown below: P® 


SIN 




" i 


.■I 

<\ 

JLffl 

' fj 

l;ij 

' 

,■ : | 


FA? 




If we did not know how to produce these, algorithms in previous chapters would show us; § 
how. We have labeled the states with the letters p, q, r, and s for clarity. Instead of using the j 
logical formula 

(Lj n l 2 ') + (L 2 n l,') 

we build our machine based on the equivalent set theory formula 

(V+l 2 )' + (l 2 '+l,)' 
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The machine for the first half of this formula is (FA/ + FA 2 )' 


a 



The machine for the second half is (FA/ +FAj)' 


U 



a, b 


It was not an oversight that we failed to mark any of the states in these two machines 
with a +. Neither machine has any final states. For (FA/ + FA 2 )’ to have a final state, the 
machine (FA/ + FA 2 ) must have a nonfinal state. The start state for this machine is q x or r v 
From there, if we read an a, we go to < 7 , or r v and if we read instead a b, we go to q 2 or r T If 
we ever get to q 2 or r 2 , we must stay there. From q { or r 3 an input b takes us to q 2 or r 2 and 
an input a leaves us at q x or r y All in all, from - we cannot get to any other combination of 
states, such as the potential q 2 or r x or q { or r T Now because q 2 is a + and r, and r 3 are both 
+ , all three states (q x or r,, q x or r v and q 2 or r 2 ) are +, which means that the complement 
has no final states. 

The exact same thing is true for the machine for the second half of the formula. Clearly, 
if we added these two machines together, we would get a machine with nine states and no fi¬ 
nal state. Because it has no final state, it accepts no words and the two languages L s and L 2 are 
equivalent. This ends the decision procedure. There are no words in one language that are not 
in the other, so the two regular expressions define the same language and are equivalent. ■ 

This example is a paradigm for the general situation. The machine for (L ' + L 2 )' ac¬ 
cepts only those words in L, but not L r If the languages are in fact equal, this machine .will 
'!V* ve no reachabJeJfinal^tates, The same will be true for the machine for (L/ + L,)'. It will 
never be necessary to combine these two machines, because if either accepts a word, then 
L\ 7 ^ l_ v 

When we listed three ways that a machine could accept no words, the first way was that 
there be no final states and the second and third ways were that the final states not be reach¬ 
able from the start state. We counted these situations separately. When we form a machine 
by adding two machines together, we do not usually bother describing the states that are not 
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reachable from the start state. The algorithm that we described in Chapter 7 never gets to ? 
consider combinations of states of the component machines that are never referred to. How* jj 
ever, if we used a different algorithm, based on writing down the whole table of possible! 
combinations and then drawing edges between the resultant states as indicated, we would, iigg 
this example, produce a picture with a final state but it would be unreachable from the start : j 
state. In the preceding example, the full machine for ( FA X ' + FA 2 )' is this: 


I. 



The only final state (q ] or r 2 ) cannot be reached from anywhere—in particular, not froiifj 
the start state (< q x or r,). So, the machine accepts no words. 3 

We can summarize what we have learned so far in the following theorem. 


THEOREM 18 

There is an effective procedure to decide whether: 

1. A given FA accepts any words. 

2. Two FAs are equivalent. 

3. Two regular expressions are equivalent. 

FINITENESS 




■iff 


Let us now answer our last question of decidability. How can we tell whether an FA, or regu . 
lar expression, defines a finite language or an infinite language? 

With regular expressions this is easy. The closure of any nonempty set, whether finite or ^ 
infinite, is itself infinite. Even the closure of one letter is infinite. Therefore, if when building | 
the regular expression from the recursive definition, we have ever had to use the closure op¬ 
erator, the resulting language is infinite. This can be determined by scanning the expression- 
itself to see whether it contains the symbol *. If the regular expression does contain a *, then 
the language is infinite. The one exception to this rule is A*, which is just A. This one ex-^ 
ception can, however, be very tricky. Of the two regular expressions 

(A + aA*)(A* + A)* and (A + aA)*(A* + A)* 
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only the second defines an infinite language. 

If the regular expression does not contain a *, then the language is necessarily finite. 
This is because the other rules of building regular expressions (any letter, sum, and product) 
cannot produce an infinite set from finite ones. Therefore, as we could prove recursively, the 
result must be finite. 

If we want to decide this question for an FA, we could first convert it to a regular ex¬ 
pression. On the other hand, there are ways to determine whether an FA accepts an infinite 
language without having to perform the conversion. 


THEOREM 19 

Let F be an FA with N states. Then: 

1. If F accepts an input string w such that 

N < length(w) < 2 N 
then F accepts an infinite language. 

2. If F accepts infinitely many words, then F accepts some word w such that 

N < length(w’) < 2 N 

PROOF 

1. The first version of the pumping lemma assumed the language was infinite, but for 
the second version this was not required, because a word is long enough to be 
pumped if it has more letters than the FA has states. If there is some word w with N 
or more letters, then by the second version of the pumping lemma, we can break it 
into three parts: 

w — xyz 

The infinitely many different words xy"z for n — 1,2, 3, . . . are all accepted by F. 

2. Now we are supposing that F does accept infinitely many words. Then it must accept 
a word so large that its path must contain a circuit, maybe several circuits. Each cir¬ 
cuit can contain at most N states because F has only N states in total. Let us change 
the path of this long word by keeping the first circuit we come to and bypassing all 
the others. To bypass a circuit means to come up to it, go no more than part way 
around it, and leave at the first occurrence of the state from which the path previ¬ 
ously exited. 

This one-circuit path corresponds to some word accepted by F. The word can have 
at most 2 N letters, because at most N states are on the one circuit and at most N states 
are encountered off that circuit. If the length of this word is more than N, then we have 
found a word whose length is in the range that the theorem specifies. If, on the other 
hand, the length of this word is less than A, we can increase it by looping around the 
one circuit until the length is greater than N. The first time the length of the word (and 
path) becomes greater than N, it is still less than IN, because we have increased the 
word only by the length of the circuit, which is less than N. Eventually, we come to an 
accepted word with a length in the proper range. ■ 
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EXAMPLE 

Consider this example: 



The first circuit is 2-3-4. It stays. The second circuit is 5-6-7-8. It is bypassed to become 
5-6-7-9. | 

The path that used to be 

1-2-3-4-2-3-5-6-7-8-5-6-7-8-5-6-7-9+ % 

becomes 

1-2-3-4-2-3-5-6-7-9 + 

This path contains 11 states. The total machine has N states where N is at least 10. If 11 
is not in the range of N to 2N then continue to add three states by looping around 2-3-4 until 
the total path length is between N and 2 N. ■ 

This theorem provides us with an effective procedure for determining whether F accepts 
a finite language or an infinite language. We simply test the finitely many strings with 
lengths between N and 2 N by running them on the machine and seeing whether any reach a 
final state. If none does, the language is finite. Otherwise, it is infinite. 

THEOREM 20 

There is an effective procedure to decide whether a given FA accepts a finite or an infinite 
language. 

PROOF 

If the machine has N states and the alphabet has m letters, then in total there are 
m N + m N+ 1 + m /V + 2 + • • ■ + m 2N ~ x 
different input strings in the range 

N < length of string <2 N 

We can test them all by running them on the machine. If any are accepted, the language is 
infinite. If none are accepted, the language is finite. 

It may often be more efficient to convert the FA to a regular expression, but so what? 

In the case where the machine has three states and the alphabet has two letters, the num¬ 
ber of strings we have to test is 
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2 3 + 2 4 + 2 5 = 8 + 16 + 32 — 56 

which is not too bad. However, an FA with three states can be converted into a regular ex¬ 
pression in very few steps. 

jp problems 

For Problems 1 through 5, show by the method described in this chapter that the following 
pairs of FAs are equivalent: 
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Why is this problem wrong? How can it be fixed? 

6. Using the method of intersecting each machine with the complement of the other, sho 


do not accept the same language. 

Using the method of intersecting each machine with the complement of the other, show 
that 


do not accept the same language. 

8. List the 56 strings that will suffice to test whether a three-state FA over X 
a finite language. 












which an infinite one 
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14. Without converting it into a regular expression or an FA, give an algorithm that decides 
whether a TG accepts any words. 

15. Without converting it into a regular expression or an FA, give an algorithm that decides 
whether the language of an NFA is empty, finite, or infinite. 

16. Do the same as Problem 15 for NFA-A’s. Be careful. The machine 


a 



has an infinite language, whereas the machine 


A 



has a one-word language. 

17. Consider the following simplified algorithm to decide whether an FA with exactly A 
states has an empty language: 

Step 1 Take the edges coming out of each final state and turn them into loops going 
back to the state they started from. 

Step 2 Relabel all edges with the letter a. (We now have an NFA.) 

Step 3 The original FA has a nonempty language if and only if this new NFA accepts 
the word .r v . 

Illustrate this algorithm and prove it always works. 

Is this an effective procedure? 

18. By moving the start state, construct a decision procedure to determine whether a given 
FA accepts at least one word that starts with an a . 

19. (i) Construct a decision procedure to determine whether a given FA accepts at least 

one word that contains the letter b. 

(ii) Construct a decision procedure to determine whether a given FA accepts some 
words of even length. 

20. Given two regular expressions r, and r 2 , construct a decision procedure to determine 
whether the language of r l is contained in the language of r 2 . 
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Because of the nature of early computer input devices, such as keypunches, paper tape, mag¬ 
netic tape, and typewriters, it was necessary to develop a way of writing complicated algel 
braic expressions in one line of standard typewriter symbols. Some few new symbols couk 
be invented if necessary, but the whole expression had to be encoded in a way that did n| 
require a multilevel display or depend on the perception of spatial arrangement. Formula! 
had to be converted into linear strings of characters. 

Several of the adjustments that had to be made were already in use in the scientific litef- 
ature for various other reasons. For example, the use of the slash as a divide sign was alreadj 
accepted by the mathematical public. Most publishers had special symbols for the popular 
fractions such as \ and 4 , but eight-elevenths was customarily written as 8 / 11 . 

Still, before the days of the computer no one would ever have dreamed of writing I 
complicated compound fraction such as 


in the parentheses-laden one-line notation 

((1/2) + 9)/(4 + (8/21) + (5/(3 + (1/2)))) 

The most important reason for not using the one-line version unless necessary is that in t 
two-dimensional version we can easily see that the number we are looking at is a little me 
than 9 divided by a little more than 5, so it obviously has a value between 1 and 2. Lookii 
at the parentheses notation, we see that it is not even obvious which of the slash marks sep 
rates the numerator from the denominator of the major division. 

How can a computer scan over this one-line string of typewriter characters and figu 
out what is going on? That is, how can a computer convert this string into its personal la 
guage of LOAD this, STORE that, and so on? 
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The conversion from a “high-level” language into a machine-executable language is done 
by a program called the compiler. This is a superprogram. Its input data are other programs. It 
processes them and prints out an equivalent program written in machine or assembler language. 
'To do this, it must figure out in what order to perform the complicated set of arithmetic opera¬ 
tions that it finds written out in the one-line formula. It must do this in a mechanical, algorithmic 
wa y. It cannot just look at the expression and understand it. Rules must be given by which this 
string can be processed—rules, perhaps, like those the machines of Part I could follow. 

Along with evaluating those input strings that do have a meaning, we want our machine 
to be able to reject strings of symbols that make no sense as arithmetic expressions, such as 
“((9) + This input string should not take us to a final state in the machine. However, we 
cannot know that this is a bad input string until we have reached the last letter. If the + were 
changed to a ), the formula would be valid. An FA that translated expressions into instruc¬ 
tions simultaneously as it scanned left to right like a Mealy machine would already be tum- 
jng out code before it realized that the whole expression is nonsense. 

Before we try to build a compiling machine, let us return to the discussion of what is and 
what is not a valid arithmetic expression as defined in Chapter 3 by recursive definition (p. 25). 

Rule 1 Any number is in the set AE. 

Rule 2 If x and y are in AE, then so are 

(x) -(x) (x + y) (x-y) (x*y) (x/y) (x**y) 

This time we have included parentheses around every component factor. This avoids the am¬ 
biguity of expressions like 3 + 4*5 and 8/4/2 by making them illegal. We shall present a 
more forgiving definition of this language later. 

First, we must design a machine that can figure out how a given input string was built up 
from these basic rules. Then we should be able to translate this sequence of rules into an as¬ 
sembler language program, because all these rules are pure assembler language instructions 
(with the exception of exponentiation, which presents a totally different problem, but be¬ 
cause this is not a course in compiler design, we ignore this embarrassing fact). 

For example, if we present the input string 

((3 + 4) * (6 + 7)) 

and the machine discovers that the way this can be produced from the rules is by the sequence 

3 is in AE 

4 is in AE 

(3 + 4) is in AE 

6 is in AE 

7 is in AE 

(6 + 7) is in AE 

((3 + 4) * (6 + 7)) is in AE 

we can therefore algorithmically convert this into 

LOAD 3 in register 1 
LOAD 4 in register 2 

ADD the contents of register 2 into register 1 
LOAD 6 in register 3 
LOAD 7 in register 4 

ADD the contents of register 3 into register 4 
MULTIPLY register 1 by register 4 

or some such sequence of instructions depending on the architecture of the particular ma¬ 
chine (not all computers have so many arithmetic registers or allow multiplication. 
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The hard part of the problem is to figure out by mechanical means how the input string! 
can be produced from the rules. The second part—given the sequence of rules that crea 
the expression, to convert it into a computer program to evaluate the expression—is easy. 

The designers of the first high-level languages realized that the problem of interpretinjj 
algebra is analogous to the problem humans face hundreds of times every day when they de¬ 
cipher the grammatical structure of sentences that they hear or read in English. Here, we 
have again the ever-present parallelism: Recognizing the structure of a computer language! 
instruction is analogous to recognizing the structure of a sentence in a human language. 

Elementary school used to be called grammar school because one of the most important 
subjects taught was English grammar A grammar is the set of rules by which the valid sen¬ 
tences in a language are constructed. The rules by which sentences are made are an example of 
an organically evolved recursive definition. Our ability to understand what a sentence means is. f 
based on our ability to understand how it could be formed from the rules of grammar. Determin¬ 
ing how a sentence can be formed from the rules of grammar is called parsing the sentence. 

When we hear or read a sentence in our native language, we do not go through a con-! 
scious act of parsing. Exactly why this is the case is a question for other sciences. Perhaps it; 
is because we learned to speak as infants by a trial-and-error method that was not as math- 
ematical and rigorous as the way in which we learn foreign languages later in life. When wei 
were bom, we spoke no language in which the grammar of our native tongue could be de¬ 
scribed to us. However, when we leam a second language, the rules of grammar for that lan-“ 
guage can be explained to us in English. How we can possibly leam our first language is a 1 
problem discussed by linguists, psychologists, philosophers, and worried parents. Whether^ 
the way we teach computers to speak is the same as the way humans leam is an interesting" 
question, but beyond our present mandate. 

Even though human languages have rules of grammar that can be stated explicitly, it i|J 
still true that many invalid sentences, those that are not, strictly speaking, grammatical, can be 
understood. Perhaps this is because there are tacit alternative rules of grammar that, although 
not taught in school, nevertheless are rules people live by. But this will not concern us either.; 
No computer yet can forgive the mess, “Let x equal two times the radius times that funny look¬ 
ing Greek letter with the squiggly top that sounds like a pastry, you know what I mean?” The 
rules of computer language grammar are prescriptive—no ungrammatical strings are accepted. 

Because the English word “grammar” can mean the study of grammar as well as the set 
of rules themselves, we sometimes refer to the set of rules as forming a generative gram¬ 
mar. This emphasizes the point that from them and a dictionary (the alphabet) we can gener¬ 
ate all the sentences (words) in the language. 

Let us look at the rule in English grammar that allows us to form a sentence by juxta¬ 
posing a noun and a verb (assuming that the verb is in the correct person and number). We 
might produce 

Birds sing. 

However, using the same rule might also produce 

Wednesday sings. or Coal mines sing. 

If these are not meant to be poetical or metaphoric, they are just bad sentences. They violate 
a different kind of rule of grammar, one that takes into account the meaning of words as well 'A 
as their person, number, gender, and case. 

Rules that involve the meaning of words we call semantics and rules that do not involve 
the meaning of words we call syntax. In English, the meaning of words can be relevant, but 
in arithmetic the meaning of numbers is rarely cataclysmic. In the high-level computer lan- gj 
guages, one number is as good as another. If 


m 




X = B + 9 

is a valid formulation, then so are 

X = B + 8 X = B + 473 X = B + 9999 

So long as the constants do not become so large that they are out of range, we do not try to 
divide by 0 , take the square root of a negative number, and we do not mix fixed-point num¬ 
bers with floating-point numbers in bad ways, one number is as good as another. It could be 
argued that such rules as “thou shalt not divide by zero” as well as the other restrictions 
mentioned are actually semantic laws, but this is another interesting point that we shall not 
discuss. In general, the rules of computer language grammar are all syntactic and not seman¬ 
tic, which makes the task of interpretation much easier. 

There is another way in which the parsing of arithmetic expressions is easier than the 
parsing of English sentences. To parse the English sentence, “Birds sing.”, it is necessary to 
look up in the dictionary whether “birds” is a noun or a verb. To parse the arithmetic expres¬ 
sion “(3 + 5)*6”, it is not necessary to know any other characteristics of the numbers 3, 5, 
and 6 . We shall see more differences between simple languages and hard languages as we 
progress. 

Let us go back to the analogy between computer languages and English. Some of the 
rules of English grammar are these: 

1. A sentence can be a subject followed by a predicate . 

2. A subject can be a noun-phrase . 

3. A noun-phrase can be an adjective followed by a noun-phrase . 

4. A noun-phrase can be an article followed by a noun-phrase . 

5. A noun-phrase can be a noun . 

6 . A predicate can be a verb followed by a noun-phrase . 


7. 

A noun can be 







apple 

bear 

cat 

dog 

8 . 

A verb can be 







eats 

follows 

gets 

hugs 

9. 

An adiective can be 








itchy 

jumpy 


10 . 

An article can be 








a an 

the 



Let us, for the moment, restrict the possibility of forming sentences to the laws stated 
above. Within this small model of English, there are hundreds of sentences we can form— 
for example. 

The itchy bear hugs the jumpy dog. 

The method by which this sentence can be generated is outlined here: 


sentence => subject predicate 

=> noun-phrase predicate 
=» no u n-phrase verb n o u n - p hrase 


Rule 1 
Rule 2* 
Rule 6 
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verb noun-p 


=> article adjective noun-phrase verb noun-phrase Rule 3 

=> article adjective noun verb noun-phrase Rule 5 ; jgj 

=> article adjective noun verb article noun-phrase Rule 4 

=> article adjective noun verb article adjective noun-phrase Rule 3 

=> article adjective noun verb article adjective noun Rule 5 

=» the adjective noun verb article adjective noun Rule 10 3 

=3> the itchy noun verb article adjective noun Rule 9 ' J 

=> the itchy bear verb article adjective noun Rule 7 

=> the itchy bear hugs article adjective noun Rule 8 I a 

=> the itchy bear hugs the adjective noun Rule 10 

=> the itchy bear hugs the jumpy noun Rule 9 jp 

=> the itchy bear hugs the jumpy dog Rule 7 

A law of grammar is in reality a suggestion for possible substitutions. The arrow (= 
indicates that a substitution was made according to the preceding rules of grammar. Whalf 
happened above is that we started out with the initial symbol sentence . We then applied the! 
rules for producing sentences listed in the generative grammar. In most cases, we had some! 
choice in selecting which rule we wanted to apply. There is a qualitative distinction betweenS 
the word “ noun ” and the word “ bear .” To show this, we have underlined the words that standll 
for parts of speech and are not to be considered themselves as words for the finished seaSj 
tences. Of course, in the complete set of rules for English the words “verb,” “adjective,” aiS 
so on, are all perfectly good words and would be included in our final set of rules as usableJj 
words. They are all nouns. But in this model the term verb is a transitory place holder. Ifjf 
means, “stick a verb here.” It must eventually be replaced to form a finished sentence. 

Once we have put in the word “bear,” we are stuck with it. No rule of grammar says th&tgj 
a bear can be replaced by anything else. The words that cannot be replaced by anything arejj 
called terminals. Words that must be replaced by other things we call nonterminals. We \T 
will give a more general definition of this shortly. The job of sentence production is not com-M 
plete until all the nonterminals have been replaced with terminals. 

Midway through the production procedure, we developed the sentence into as many|J 
nonterminals as it was going to become. 

article adjective noun verb article adjective noun j 

From this point on, the procedure was only one of selecting which terminals were to be inf® 
serted in place of the nonterminals. This middle stage in which all the terminals are identic Jg 
tied by their nonterminal names is the “grammatical parse” of the sentence. We can tell what % 
noun each adjective modifies because we know how it got into the sentence in the first place, jf 
We know which noun-phrase produced it. “Itchy” modifies “bear” because they were botlfj 8 
introduced by application of Rule 3. Sjg 

We have allowed a noun-phrase to be an adjective followed by a noun-phrase. This 
could lead to - fjj| 

noun-phrase =» adjective noun-phrase -ST 

=» adjective adjective noun-phrase 

=> adjective adjective adjective noun-phrase | 

=> itchy adjective adjective noun 

=> itchy itchy adjective noun 5 
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=> itchy itchy itchy noun 
=> itchy itchy itchy bear 

If we so desired, we could produce 50 itchy’s. Using the Kleene closure operator, we 
could write something like 

noun-phrase => adjective * noun 
But now, we are getting ahead of ourselves. 

The rules we have given for this simplified version of English allow for many dumb sen¬ 
tences, such as 

Itchy the apple eats a jumpy jumpy jumpy bear. 

Because we are not considering the limitations of semantics, diction, or good sense, we must 
consider this string of terminals as a legitimate sentence. This is what we mean by the phrase 
“formal language,” which we used in Part I. It is a funny phrase because it sounds as if we 
mean the stuffy language used in aristocratic or diplomatic circles. In our case, it means only 
that any string of symbols satisfying the rules of grammar (syntax alone) is as good as any 
other. The word “formal” here means “strictly formed by the rules,” not “highly proper.” The 
Queen of England is unlikely to have made the remark above about itchy the apple. 

We can follow the same model for defining arithmetic expressions. We can write the 
whole system of rules of formation as the list of possible substitutions shown below: 

Start —»(AE) 

AE —* ( AE + AE) 

AE—> (AE - AE) 

AE —* (AE * AE) 

AE —* AE / AE) 

AE—»(AE ** AE) 

AE—» (AE) 

AE—> -(AE) 

AE—» ANY-NUMBER 

Here, we have used the word “ Start ” to begin the process, as we used the symbol “ sen¬ 
tence ” in the sample of English. Aside from Start , the only other nonterminal is AE. The ter¬ 
minals are the phrase “ ANY-NUMBER ” and the symbols 

+ — */**() 

Either we could be satisfied that we know what is meant by the words “any number,” or 
else we could define this phrase by a set of rules, thus converting it from a terminal into a 
nonterminal. 

Rule 1 ANY-NUMBER -> FIRST-DIGIT 

Rule 2 FIRST-DIGIT -> FIRST-DIGIT OTHER-DIGIT 

Rule 3 FIRST-DIGIT -> 1 23456789 

Rule 4 OTHER-DIGIT —>0 123456789 

Rules 3 and 4 offer choices of terminals. We put spaces between them to indicate 
“choose one,” but we soon shall introduce another disjunctive symbol. 

We can produce the number 1066 as follows: 


ANY-NUMBER => FIRST-DIGIT 


> FIRST-DIGIT OTHER-DIGIT 


Rule 1 
Rule 2 
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=» FIRST-DIGIT OTHER-DIGIT OTHER-DIGIT 


Rule 2 


==» FIRST-DIGIT OTHER-DIGIT OTHER-DIGIT OTHER-DIGIT Rule 2 
=> 1066 Rule 3 and 4 

Here, we have made all our substitutions of terminals for nonterminals in one 
swoop, but without any possible confusion. One thing we should note about the definitic 
AE is that some of the grammatical rules involve both terminals and nonterminals tog 
In English, the rules were either of the form 

One Nonterminal —> string of Nonterminals 


One Nonterminal —> choice of terminals 

In our present study, we shall see that the form of the rules in the grammar has great s 
nificance. 

The sequence of applications of the rules that produces the finished string of te 
nals from the starting symbol is called a derivation or a generation of the word, 
grammatical rules are often referred to as productions. They all indicate possible su 
tutions. The derivation may or may not be unique, which means that by applying pro 
tions to the start symbol in two different ways, we may still produce the same finis 
product. 

We are now ready to define the general concept of which all these examples have 
special cases. We call this new structure a context-free grammar, or CFG. The full mea 
of the term “context-free” will be made clear later. The concept of CFGs was invented by 
linguist Noam Chomsky in 1956. Chomsky gave several mathematical models for languag 
and we shall see more of his work later. 


r: 


1 
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SYMBOLISM FOR GENERATIVE GRAMMARS 


DEFINITION 


A context-free grammar, CFG, is a collection of three things: 

1. An alphabet 2 of letters called terminals from which we are going to make strings t’ 
will be the words of a language. 

2. A set of symbols called nonterminals, one of which is the symbol S, standing for “sta 
here.” 

3. A finite set of productions of the form 


i 


One Nonterminal —* finite string of terminals and/or Nonterminals 

where the strings of terminals and nonterminals can consist of only terminals or of onl 
nonterminals, or of any mixture of terminals and nonterminals or even the empty strin 
We require that at least one production has the nonterminal S as its left side. 


So as not to confuse terminals and nonterminals, we always insist that nonterminals 
designated by capital letters, whereas terminals are usually designated by lowercase letter! 
and special symbols. In our example for English, we underlined the nonterminals, but thii 
treatment is more standard. 


Symbolism for Generative Grammars 

definition 


The language generated by a CFG is the set of all strings of terminals that can be produced 
from the start symbol S using the productions as substitutions. A language generated by a 
CFG is called a context-free language, abbreviated CFL. ■ 

There is no great uniformity of opinion among experts about the terminology to be used 
here. The language generated by a CFG is sometimes called the language defined by the CFG, 
the language derived from the CFG, or the language produced by the CFG. This is similar to 
the problem with regular expressions. We should say “the language defined by the regular ex¬ 
pression,” although the phrase “the language of the regular expression” has a clear meaning. 

EXAMPLE 

Let the only terminal be a and the productions be 

Prod 1 S —> aS 
Prod 2 S— * A 

If we apply production 1 six times and then apply production 2, we generate the following: 

S aS 
=>aaS 
=*aaaS 
=>aaaaS 
=>aaaaaS 
=> aaaaaaS 
=> aaaaaaS. 

= aaaaaa 

This is a derivation of a 6 in this CFG. The string a n comes from n applications of pro¬ 
duction 1 followed by one application of production 2. If we apply production 2 without pro¬ 
duction 1, we find that the null string is itself in the language of this CFG. Because the only 
terminal is a, it is clear that no words outside of a* can possibly be generated. The language 
generated by this CFG is exactly a*. ■ 

In the examples above, we used two different arrow symbols. The symbol we 
employ exclusively in the statement of the productions. It means “can be replaced by,” as in 
S — » aS. The other arrow symbol “=>” we employ between the unfinished stages in the gen¬ 
eration of our word. It means “can develop into,” as in aaS => aaaS. These “unfinished 
stages” are strings of terminals and nonterminals that we shall call working strings. 

Notice that in this last example we have both S —► aS as a production in the abstract and 
S =* aS as the first step in a particular derivation. 


EXAMPLE 

Let the only terminal be a and the productions be 

Prod 1 5 —»SS 
Prod 2 S —> a 
Prod 3 S —► A 
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In this language, we can have the following derivation: 

S^>SS 
=>SSS 
—^ SaS 
=>SaSS 
=>A aSS 
—^ AaaS 
=*AaaA 


The language generated by this set of productions is also just the language a*, hut 
this case the string aa can be obtained in many (actually infinitely many) ways. In the ffi 
example, there was a unique way to produce every word in the language. This also illustrati 
that the same language can have more than one CFG generating it. Notice above that the 
are two ways to go from SS to SSS —either of the first two S’ s can be doubled. 

In the previous example, the only terminal is a and the only nonterminal is S . What the 
is A? It is not a nonterminal, because there is no production of the form 

A —> something 

Yet, it is not a terminal, because it vanishes from the finished string AaaA = aa. As alway 
A is a very special symbol and has its own status. In the definition of a CFG, we said a nor 
terminal could be replaced by any string of terminals and/or nonterminals, even the emp 
string. To replace a nonterminal by A is to delete it without leaving any tangible remain 
For the nonterminal N, the production 

A —A 

means that whenever we want, N can simply be deleted from any place in a working strut 


EXAMPLE 


Let the terminals be a and b, the only nonterminal be S, and the productions be 

Prod 1 S^>aS * 

Prod 2 S-^bS 
Prod 3 S —a 
Prod 4 S— 

We can produce the word baab as follows: 

S =» bS (by Prod 2) 

=>baS (by Prod 1) 

==> baaS (by Prod 1) 

^ baab (by Prod 4) 

The language generated by this CFG is the set of all possible strings of the letters a and b ex- 

cept for the null string, which we cannot generate. S 

We can generate any word by the following algorithm: \i 

At the beginning, the working string is the start symbol S. Select a word to be generated." 

Read the letters of the desired word from left to right one at a time. If an a is read that is nojN 
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the last letter of the word, apply Prod 1 to the working string. If a b is read that is not the 
last letter of the word, apply Prod 2 to the working string. If the last letter is read and it is an 
I '^ apply Prod 3 to the working string. If the last letter is read and it is a h , apply Prod 4 to 

SR the working string. 

At every stage in the derivation before the last, the working string has the form 

if (string of terminals) S 

At every stage in the derivation, to apply a production means to replace the final nonter¬ 
minal S. Productions 3 and 4 can be used only once and only one of them can be used. For 
example, to generate babb, we apply in order productions 2, 1,2,4, as below: 


EXAMPLE 

Let the terminals be a and b, the nonterminals be 5, X, and Y , and the productions be 

S —X 

S —T 

X—* A 

Y->aY 

Y^bY 

Y—*a 

Y—>b 

All the words in this language are either of type X , if the first production in their derivation is 

S —X 

or of type Y, if the first production in their derivation is 

s—r 

The only possible continuation for words of type X is the production 

X —A 

Therefore, A is the only word of type X. 

The productions whose left side is Y form a collection identical to the productions in the 
previous example except that the start symbol S has been replaced by the symbol Y. We can 
carry on from Y the same way we carried on from 5 before. This does not change the lan¬ 
guage generated, which contains only strings of terminals. Therefore, the woids of type Y 
are exactly the same as the words in the previous example. That means that any string of a’s 
and b ’s except the null string can be produced from Y as these strings were produced before 
from S. 

Putting together the type X and the type Y words, we see that the total language gener¬ 
ated by this CFG is all strings of a 's and b’ s, null or otherwise. The language generated is 
(a + b)*. “ 




EXAMPLE 

Let the terminals be a and b. the only nonterminal be S, and the productions be 
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. 1 ! 

1 


S~+aS 
S—>bS 
S—> a 


The word ab can be generated by the derivation 


or by the derivation 


The language of this CFG is also (a + b)*, but the sequence of productions that is used ra 
generate a specific word is not unique. J 

If we deleted the third and fourth productions, the language generated would be the 


EXAMPLE 


Let the terminals be a and b, the nonterminals be S and X, and the productions be 


We already know from the previous example that the last three productions will allow us:| 
to generate any word we want from the nonterminal X. If the nonterminal X appears in any| 
working string, we can apply productions to turn it into any string we want. Therefore, the J 
words generated from S have the form U 

anything aa anything 


(a + b)*aa(a + b)* 

which is the language of all words with a double a in them somewhere. 

For example, to generate baabaab, we can proceed as follows: 

S =>XaaX => bXaaX => baXaaX => baaXaaX =» baabXaaX 
=> baabAaaX => baabaaX => baabaabX ==> baabaab A = baabaab 

There are other sequences that also derive the word baabaab. 


EXAMPLE 



Let the terminals be a and b t the nonterminals be S, X, and Y, and the productions be 
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S-+XY 

X->aX 

X-^bX 

X—* a 
Y-> Ya 
Y—*Yb 
Y a 

What can be derived from XI Let us look at the X productions alone: 

X-»aX 

X~»bX 

X-+a 

Beginning with the nonterminal X and starting a derivation using the first two productions, we 
always keep a nonterminal X on the right end. To get rid of the X for good, we must eventu¬ 
ally replace it with an a by the third production. We can see that any string of terminals that 
comes from X must end in an a and any words ending in an a can be derived from X in a 
unique fashion. For example, to derive the word babba from X, we must proceed as follows: 

X=>bX=> baX => babX =» babbX => babba 

Similarly, the words that can be derived from Y are exactly those that begin with an a. 
To derive abbab, for example, we can proceed: 

Y=*Yb=> Yab => Ybab => Ybbab =» abbab 

When an X-part is concatenated with a T-part, a double a is formed. 

We can conclude that starting from S, we can derive only words with a double a in 
them, and all these words can be derived. 

For example, to derive babaabb, fix we know that the X-part must end at the first a of 
the double a and that the K-part must begin with the second a of the double a: 

S => XY => bXY =» baXY ==> babXY =* babaY 
=> babaYb => babaYbb => babaabb 

Therefore, this grammar generates the same language as the last, although it has more non¬ 
terminals and more productions. ■ 


EXAMPLE 

Let the terminals be a and b and the three nonterminals be S, BALANCED, and UNBAL¬ 
ANCED. We treat these nonterminals as if they were each a single symbol and nothing more 
confusing. Let the productions be 

S—*SS 

S -* BALANCED S 
S—>S BALANCED 
S—> A 

S —* UNBALANCED 5 UNBALANCED 
BALANCED —» aa 
BALANCED —> bb 
UNBALANCED —+ ab 
UNBALANCED —* ba 
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We shall show that the language generated from these productions is the set of a|j 
with an even number of a’ s and an even number of b' s. This is our old friend, theH 
EVEN-EVEN. 

To prove this, we must show two things: that all the words in EVEN-EVEN eafilil 
erated from these productions and that every word generated from these productifJfSf 
fact, in the language EVEN-EVEN. 

First, we show that every word in EVEN-EVEN can be generated by these pr 
From our earlier discussion of the language EVEN-EVEN, we know that every word'i 
language can be written as a collection of substrings of 

type aa or type bb or type(ab + ba)(aa + bb)*(ab + ba) 

All three types can be generated from the nonterminal S from the preceding! 
tions. The various substrings can be put together by repeated application of the prodiJ! 

S-+SS 

This production is very useful. If we apply it four times, we can turn one S into ff|g| 
Each of these S’s can be a syllable of any of the three types. For example, the EVEN§f 
word aabahbab can be produced as follows: 

S => BALANCED 5 
=> aaS 

=* aa UNBALANCED S UNBALANCED 
=*aa baS UNBALANCED 
=> aa ba S ab 

=> aa ba BALANCED S ab 
=> aa ba hb S ab 
=> aa ba bb A ab 
= aabahbab 

To see that all the words that are generated by these productions are in the language® 
EVEN, we need only to observe that the unbalanced pairs are only added into the‘||j 
string by one production and then they enter two at a time. 

Therefore, the language generated by this CFG is exactly EVEN-EVEN. 

So far, we have demonstrated several regular languages that could also be defil 
CFGs. If all the languages that CFGs could generate were regular, this topic wauilj 
been included in Part I; therefore, the alert reader will expect that CFGs can also ga Hj 
least some nonregular languages too. The following examples show that this is; 
case. 


EXAMPLE 


Let us consider the CFG 


S-^aSb 
S —* A 


We shall now show that the language generated by these productions is the: 
nonregular language {aW 1 }. There is apparently only one nonterminal S and two te ffjjjj 
and b (heretofore we have announced the terminals and nonterminals before stating f|j| 
duction set, yet this is one of those fastidiousnesses one quickly outgrows). As long $ 
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aue to apply the first production, the working string produced will always have one and 
y one nonterminal in it and that will be a central S. Whenever we choose to employ the 
nd production, the S drops out and what is left is a string of terminals that must then be 
[ generated by the grammar. The fact that the S always stays dead-center follows from 
ct that production 1 always replaces the S with a string in which the S is again dead- 
JRjggr. So, if it used to be in the middle, it remains in the middle, and because it starts in the 
die, it stays there, because the middle of the middle section is the middle of the string, 
(the right side of the 5, we have nothing but a’ s, and on the left side of the S, we have 
ng but b’ s. Therefore, after six applications of the first production, we must have the 
forking string a 6 Sb 6 . If we apply the second production now, the word a 6 b 6 would be pro- 
feiuced. 

S=>aSb => aaSbb 
=> aaaSbbb => aaaaSbbhb 
=>aaaaaSbbbbb => aaaaaaShbbbbb 
=>aaaaaabbbbbb 

Clearly, if we use production 1 m times followed by production 2, the resultant word 
aid be a m b m , and (what always must be made separately clear) every word of the form 
™ can be produced this way. Because a sequence of production l’s followed by a single 
uction 2 is the only word-producing option for this grammar, we can conclude that the 
Bguage it generates is exactly { a n b n }. M 


CAMPLE 

ve vary the rules of production slightly, we may arrive at this CFG: 


5- 

S- 

S~ 


>aSa 

*bSh 

►A 


There are a great many similarities between this grammar and the previous one. Re¬ 
ed applications of the first two rules will produce working strings with exactly one non- 
al, that is, S. Furthermore, this S begins in the middle of the working string and both 
fof production replace it with strings in which it remains in the middle, and the middle 
middle is the middle of the working string, so S is always the unique and central non- 
rminal in all working strings. 

Let us now note that the right side of each production is a palindrome (it reads the same 
ickward and forward even if it does contain both terminals and a nonterminal). Let us also 
that if a palindrome is inserted into the dead-center of another palindrome, the resultant 
ig will again be a palindrome. Once we finally employ production rule 3 and delete the S, 
gEfae final word will again be a palindrome. Therefore, all the words produced by this gram- 
will be in the language PALINDROME. However, it is not true that all the words in the 
iguage PALINDROME can be generated by this grammar. We must observe that palin- 
es come in two flavors: those, with a unique central letter and those-with even length 
xentralletter. The language generated by this grammar is that of all palindromes 
even length and no center letter called EVENPALINDROME (cf. p. 204). To prove that 
word in EVENPALINDROME can be produced from this grammar, all we need to do is 
:e any evenpalindrome and show that it itself gives us a complete set of directions of 
^ it is to be produced. These are the directions: Scan the first half of the word left to right, 
we encounter an a, it is the instruction to apply production 1; when we encounter a b, 
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it is the instruction to apply production 2; when we have finished the first half of the 
apply production 3. For example, if we start with the even palindrome abbaabba, the | 
half is abba and the rules of production to be applied are, in sequence, productions 1 2 * 
3, as below: ’ ' 

S —^ aSa 
=>abSba 
==> abbSbba 
=>abbaSabba 
=>abbaabba 


EXAMPLE 

The difference between EVENPAL1NDROME and ODDPALINDROME (whose definite 
is obvious) is that when we are finally ready to get rid of the S in the EVENPALINDROS 
working string, we must replace it with a A. If we were forced to replace it with an aon,, 
stead, we would create a central letter and the result would be a grammar for ODDPALI 
DROME as follows: 

S => aSa 
S => bSb 
S —^ a 
S == * > b 

If we allow the option of turning the central S into either A or a letter, we would have* 
grammar for the entire language PALINDROME: 

S=>aSa 
S=>bSb 
S —^ a 
S=*b 
S=> A 

The languages { a ,] b n } and PALINDROME are amazingly similar in grammatical strucif 
ture, while the first is nearly a regular expression and the other is far from it. 

EXAMPLE 

One language that we demonstrated was nonregular, which had an appearance similar to | 
{a”b n }, was { a n ba n }. This language too can be generated by a CFG 

S —* a So 
S=>b 

but the cousin language {a n ba n b n + 1 } cannot be generated by any CFG for reasons that 
shall discuss a bit later. 
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CFG. 


Let us consider one more example of a nonregular language that can be generated by a 


Let the terminals be a and b, the nonterminals be S, A, and B, and the productions be 

S-^aB 
S bA 
A —* a 
A^aS 
A —> bAA 
B-^b 
B —> bS 
B —> aBB 

The language that this CFG generates is the language EQUAL of all strings that have an 
equal number of 0 ’s and b 's in them. This language begins 

EQUAL = [ab ba aabb abab abba baab baba bbaa aaabbb. . .} 

(Notice that previously we included A in this language, but for now it has been 
dropped.) 

Before we begin to prove that this CFG does generate exactly the language EQUAL, we 
should explain the rationale behind this set of productions. The basic idea is that if a word in 
EQUAL starts with an a, the remainder of the word is a string with the property that it has, 
in total, exactly one more b than a' s. If the remainder has seven 0 ’s, then it must have eight 
b' s, because otherwise a(remainder) will not be in EQUAL. For this purpose, we introduce 
the nonterminal symbol B and we intend to write rules that will allow us to generate from B 
all strings with the property that they have exactly one more b than a' s. Analogously, if a 
word in EQUAL starts with a b , it must be of the form bA, where from A we can generate 
any string that has in total one more a than h' s. 

To begin to find a method of generating all the strings that should be derivable from A, 
we note that if the A-string begins with the letter 0 , then the rest will be a word in EQUAL 
that is either derivable from S or is A. Otherwise, despite the fact that it has one more a than 
b' s, it might still stubbornly insist on starting with a b. In this case, however, what remains is 
a string with the property that it now has two more a 's than b's. We could be tempted to in¬ 
troduce a new symbol, say, A 2 , as the nonterminal that would stand for these strings, but that 
would lead us down a path requiring more and more (eventually infinitely many) nontermi¬ 
nals. Instead, we make the useful observation that any string that contains two more a’s than 
b's can be factored into the product of two type-A strings, each with exactly one more a than 
b's. To prove this, we scan the 2-0-heavy string from left to right until we find a factor that is 
of type A. We must eventually have the number of a's surpass the number of b's because oth¬ 
erwise it could not be 2-a-heavy. All the first instant the number of a's passes the number of 
b's in the scan (necessarily by exactly one extra), we have found an A-factor. Now what is 
left of the string is again a string that is only 1 - 0 -heavy and is, therefore, itself a factor of 
type A. This is the reasoning behind the production A —► bAA. 

The three productions for B are just symmetric to the A productions. 

Now there is a little bit of a problem here because to produce EQUAL, we defined S to 
be bA, assuming that A does generate the 1- 0 -heavy strings, and later we defined A to be aS, 
assuming that 5 1 does generate only words in EQUAL. Is this reasoning not circular and 
therefore unsound? The answer is that once we know that 5, A, and B do their intended jobs 
on short strings, we will be certain that they will continue to do their job on longer and 
longer strings. Let us discuss this in detail. 
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From the rules of production we can derive a from A and b from B and therefore bot. 
and ba come from S. Now using these building blocks, we can generate from A —► aS 1 
aab and aba , and from A —> bAA we get baa. Therefore, all three-letter strings with twp i 
and one b can be derived from A. Similarly, all three-letter strings with two b' s and one a 4 
be derived from B. 

Now we consider the four-letter strings. A and B generate only odd-length strings so rf 
the relevant four-letter strings are the words in EQUAL. Once we know that all three-let3 
1-tf-heavy strings can be derived from A, we can safely conclude that all EQUAL words 3 
four letters starting with a b can be derived from S —» bA. Similarly, once we know that m 
three-letter strings derivable from B are the l-/?-heavy strings, we conclude that S 
gives all the four-letter words in EQUAL starting with an a and only those. So once w#J 
know that A and B are correct for three-letter words, we know that S is correct for four-lettefl 
words. ; gin 

Now we bounce back to six-letter words. Starting with the knowledge that S product 
all the two- and four-letter words in EQUAL, and that A and B generate all l-a-heavy and 
1-fr-heavy words of length one and three, we have no trouble concluding that the correct and 
only the correct six-letter words are derived from A and B by the production rules. We could'? 
conclude that S generates all the six-letter words in EQUAL and only those, and so on. 1 
The reasoning behind the productions is not circular but inductive. The S’s in S —* bA\ 
and A—* aS are not the same S because the second one is two letters shorter. We could also 
see a parallel between this reasoning and recursive definitions: “If a has the property, then so 
does xx, and so on.” 4 

Therefore, all the words derivable from S are the words in EQUAL and all the words in 
EQUAL are generated by S. Wi 

It is common for the same nonterminal to be the left side of more than one production. 
We now introduce the symbol “ | ”, a vertical line, to mean disjunction (or). Using it, we can. 
combine all the productions that have the same left side. For example, 

S—+aS 
S —A 

can be written simply as 


The CFG 


can be written more compactly as 

s-»x|y 

X—»A 

Y-+aY\bY\a\b 

The notation we are using for CFGs is practically universal with the following 
changes: 
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Some authors use the symbol 


instead of 


Some authors call nonterminals variables. 

Some authors use an epsilon, e, or lambda, X, instead of A to denote the null string. 

Some authors indicate nonterminals by writing them in angle brackets: 

(S) —* (X> | (T) 

(X) —* A 

(Y) -+a(Y)\b(Y)\a\b 

We shall be careful to use capital letters for nonterminals and lowercase letters for ter¬ 
minals. Even if we did not do this, it would not be hard to determine when a symbol is a ter¬ 
minal. All symbols that do not appear as the left parts of productions are terminals with the 
exception of A. 

Aside from these minor variations, we call this format—arrows, vertical bars, termi¬ 
nals, and nonterminals—for presenting a CFG the BNF, which stands for Backus normal 
form or Backus-Naur form. It was invented by John W. Backus for describing the high- 
level language ALGOL. Peter Naur was the editor of the report in which it appeared, and 
that is why BNF has two possible meanings. 

A FORTRAN identifier (variable or storage location name) can, by definition, be up to 
six alphanumeric characters long but must start with a letter. We can generate the language 
of all FORTRAN identifiers by a CFG: 


IDENTIFIER - 

X- 

LETTER - 

PIGIT- 


► LETTER XXXXX 

► LETTER 1 DIGIT | A 
>A\B\C\ . . . |Z 

► 011121 . . . 19 


Not just the language of identifiers but the language of all proper FORTRAN instruc¬ 
tions can be defined by a CFG. This is also true of all the statements in the languages C, 
PASCAL, BASIC, PL/I, and so on. This is not an accident. As we shall see later, if we are 
given a word generated by a specified CFG, we can determine how the word was produced. 
This, in turn, enables us to understand the intended instruction of the word just as identifying 
the parts of speech helps us to understand the structure of an English sentence. A computer 
must determine the grammatical structure of a computer language statement before it can ex¬ 
ecute the instruction. Let us revisit our early school days. 


TREES 


In English grammar courses, we were taught how to diagram a sentence. This meant that we 
were to draw a parse tree, which is a picture with the base line divided into subject and 
predicate. All words or phrases modifying these were drawn as appendages on connecting 
lines. For example, 

The quick brown fox jumps over the lazy dog. 


becomes 
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If the fox is dappled gray, then the parse tree would be 




because dappled modifies gray and therefore is drawn as a branch off the gray line. 
The sentence I shot the man with the gun.” can be diagrammed in two ways: 





jifc 






R#V- If 



In the first diagram, “with the gun” explains how I shot. In the second diagram, “with Jag 
the gun” explains whom I shot. ’ * 

These diagrams turn a string of ambiguous symbols into an interpretable idea by identi* ' | 
fying who does what to whom. ' Wm 

A famous case of ambiguity is the sentence “Time flies like an arrow.” We humans have - j 
no difficulty identifying this as a poetic lament, technically a simile, meaning “Time passes 3 
all too quickly, just as a speeding arrow darts inexorably across the endless skies”—or some 5 
such euphuism. 

This is diagrammed by the following parse tree: : S 




Notice how the picture grows like a tree when “an” branches from “arrow.” A graph the* M 
ory tree, unlike an arboreal tree, can grow sideways or upside down. Jjj 
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A nonnative speaker of English with no poetry in her soul (a computer, c.g.) who has 
just yesterday read the sentence “Horse flies like a banana” might think the sentence should 
be diagrammed as 



where she thinks “time flies” may have even shorter lives than drosophilae. 

Looking in our dictionary, we see that “time” is also a verb, and if so in this case, the 
sentence could be in the imperative mood with the understood subject “you,” in the same 
way that “you” is the understood subject of the sentence “Close the door.” A race track tout 
may ask a jockey to do a favor and “Time horses like a trainer” for him. The computer might 
think this sentence should be diagrammed as 



Someone is being asked to take a stopwatch and “time” some racing “flies” just as “an arrow” 
might do the same job, although one is unlikely to meet a straight arrow at the race track. 

The idea of diagramming a sentence to show how it should be parsed carries over to 
CFGs. We start with the symbol S. Every time we use a production to replace a nonterminal 
by a string, we draw downward lines from the nonterminal to each character in the string. 

Let us illustrate this on the CFG 

S^AA 

A~+AAA\bA\Ab\a 

We begin with S and apply the production S—*AA: 


To the left-hand A, let us apply the production A - 
A-^AAA: 


*bA. To the right-hand A, let us apply 


/I /l\ 

b A AAA 


The b that we have on the bottom line is a terminal, so it does not descend further. In the ter¬ 
minology of trees, it is called a terminal node. Let the four A’s, left to right,’ undergo the 
productions A —» bA, A—*a,A—*a,A —> Ab, respectively. We now have 
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/I /l\ 

b A AAA 

/\ I I / \ 

b A a a A b 

Let us finish off the generation of a word with the productions A —and A —»a: 


/i /i\ 

b A A A A 

/ \ I I / \ 

b A n a A b 


Reading from left to right, we see that the word we have produced is bbaaaab. 

As was the case with diagramming a sentence, we understand more about the finish 
word if we see the whole tree. The third and fourth letters are both a' s, but they are produe 
by completely different branches of the tree. 

These tree diagrams are called syntax trees, parse trees, generation trees, produ 
trees, or derivation trees. The variety of terminology comes from the multiplicity of, 
cations to linguistics, compiler design, and mathematical logic. 

The only rule for formation of such a tree is that every nonterminal sprouts bran 
leading to every character in the right side of the production that replaces it. If the nontei 
nal N can be replaced by the string abcde. 


N —* abcde 


then in the tree we draw 


//l\\ 

a b r d <‘ 

There is no need to put arrow heads on the edges because the direction of production is i 
ways downward. 


EXAMPLE 

One CFG for a subsystem of propositional calculus is 

S-^{S)\SDS\~S\p\q 

The only nonterminal is S. The terminals are p q — D (), where “D” is today’s symt 
for implication. 
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m 


m 




In this grammar, consider the diagram 


( s ) 


S D S 


( S ) 


S D S 


This is a derivation tree for the 13-letter word: 

(- p D (p D - q)) U 

We often say that to know the derivation tree for a given word in a given grammar is to 
understand the “meaning” of that word. 

The concept of “meaning” is one that we shall not deal with in this book. We never pre¬ 
sumed that the languages generated by our CFGs have any significance beyond being formal 
strings of symbols. However, in some languages the grammatical derivation of a string of 
symbols is important to us for reasons of computation. We shall soon see that knowing the 
tree helps us determine how to evaluate and compute. 


LUKASIEWICZ NOTATION 

Let us concentrate for a moment on an example of a CFG for a simplified version of arith¬ 
metic expressions: 

S—»S , + S'|S'*S| number 

Let us presume that we know precisely what is meant by “ number .” 

We are all familiar with the ambiguity inherent in the expression 

3 + 4*5 

Does it mean (3 + 4) * 5, which is 35, or does it mean 3 + (4 * 5), which is 23? 

In the language defined by this particular CFG, we do not have the option of putting in 
parentheses for clarification. Parentheses are not generated by any of the productions and are 
therefore not letters in the derived language. There is no question that 3 + 4 * 5 is a word in 
the language of this CFG. The only question is what does this word intend in terms of calcu¬ 
lation? 
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It is true that if we insisted on parentheses by using the grammar 

S —»(S + S) | (S * 5)1 number 


we could not produce the string 3 + 4 * 5 at all. We could only produce 


neither of which is an ambiguous expression. 

In the practical world, we do not need to use all these cluttering 
have adopted the convention of “hierarchy of operators,” which says that * 
before + . This, unfortunately, is not reflected in either grammar. Later, we 
mar that generates unambiguous arithmetic expressions that will mean 
them to mean without the need for burdensome parentheses. For now, we 
guish between these two possible meanings for the expression 3 + 4*5 
two possible derivation trees that might have produced it. 


We can evaluate an expression in parse-tree form from the tree picture 
at the bottom and working our way up to the top, replacing each 
it by the result of the calculation that it produces. 

This can be done as follows: 


m 




.iajMsiewta Notation 
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hese examples show how the derivation tree can explain what the expression intends in 
the same way that the parse trees in English grammar explain the intention of sen- 




UP.. jo the special case of this particular grammar (not for CFGs in general), we can draw 
4 anjmoful trees of terminals alone using the start symbol S only once. This will enable us 
produce a new notation for arithmetic expressions—one that has direct applications to 
niputer science. 

jjjfefl ^ m ethod for drawing the new trees is based on the fact that + and * are binary oper- 
that combine expressions already in the proper form. The expression 3 + (4 * 5) is a 
^lppp A mm 0 f w hat? A sum of a number and a product. What product? The product of two 
^timbers. Similarly, (3 + 4) * 5 is a product of a sum and a number, where the sum is a sum 
* numbers. Notice the similarity to the original recursive definition of arithmetic expres- 
& These two situations are depicted in the following trees: 





/\ / 


5 3 


These are like derivation trees for the CFG 

mi- . s—»s + s|s*s 


xcept that we have eliminated most of the S’s. We have connected the branches directly to 
the operators instead. 

® The symbols * and + are no longer terminals, because they must be replaced by num- 
^p. These are actually standard derivation trees taken from a new CFG in which 5, *, and 
are nonterminals and number is the only terminal. The productions are 


V* I + 1 number 




- ; ?"+—►++ | + * | + number | * +1 * * | * number | number + [ number * I number number 
S&~' + + | + *| + number | * + | * * | * number | number +1 number * | number number 


iff® 


fisual, number has been underlined because it is only one symbol in this case, our only 
Terminal. 

hj From these trees, we can construct a new notation for arithmetic expressions. To do this. 
Walk around the tree and write down the symbols, once each, as we encounter them. We 
l|qcgin our trip on the left side of the start symbol S heading south. As we walk around the 


vNsW 

fc§ta| 


we always keep our left hand on the tree. 

K I 


: 




‘wj 


' - T '- 


/ \\ 

<C'\\ 

V./-X 




SH 










/I\ 
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It is true that if we insisted on parentheses by using the grammar 


5 —* (5 + 5) I (5 * 5) I number 
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neither of which is an ambiguous expression. 

In the practical world, we do not need to use all these cluttering parentheses because 
have adopted the convention of “hierarchy of operators,” which says that * is to be execi 
before + . This, unfortunately, is not reflected in either grammar. Later, we present a 
mar that generates unambiguous arithmetic expressions that will mean exactly what we 
them to mean without the need for burdensome parentheses. For now, we can only disl 
guish between these two possible meanings for the expression 3 + 4 * 5 by looking at 
two possible derivation trees that might have produced it: 


We can evaluate an expression in parse-tree form from the tree picture itself by starting 
at the bottom and working our way up to the top, replacing each nonterminal as we come 
it by the result of the calculation that it produces. 

This can be done as follows: 


we could not produce the string 3 + 4 * 5 at all. We could only produce 


5~-* , *| + | number 

H —* + + | +* | + number |* +1 * * | * number | number +1 number * | number number 
*—* + + | + * | + number | * + | * * | * number | number +1 number * | number number 


As usual, number has been underlined because it is only one symbol in this case, our only 
terminal. 

From these trees, we can construct a new notation for arithmetic expressions. To do this, 
we walk around the tree and write down the symbols, once each, as we encounter them. We 
begin our trip on the left side of the start symbol 5 heading south. As we walk around the 
tree, we always keep our left hand on the tree. 




r/,>\ 


\\ 

vv 


These examples show how the derivation tree can explain what the expression intends in 
much the same way that the parse trees in English grammar explain the intention of sen¬ 
tences. 

In the special case of this particular grammar (not for CFGs in general), we can draw 
meaningful trees of terminals alone using the start symbol 5 only once. This will enable us 
to introduce a new notation for arithmetic expressions—one that has direct applications to 
computer science. 

The method for drawing the new trees is based on the fact that + and * are binary oper¬ 
ations that combine expressions already in the proper form. The expression 3 + (4 * 5) is a 
sum. A sum of what? A sum of a number and a product. What product? The product of two 
numbers. Similarly, (3 + 4) * 5 is a product of a sum and a number, where the sum is a sum 
of numbers. Notice the similarity to the original recursive definition of arithmetic expres¬ 
sions. These two situations are depicted in the following trees: 


• /\ 


/\ 


These are like derivation trees for the CFG 


except that we have eliminated most of the 5’s. We have connected the branches directly to 
the operators instead. 

The symbols * and + are no longer terminals, because they must be replaced by num¬ 
bers. These are actually standard derivation trees taken from a new CFG in which 5, *, and 
+ are nonterminals and number is the only terminal. The productions are 


5 


5 + 55*5 number 
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The first symbol we encounter on the first tree is + . This we write down as the 
symbol of the expression in the new notation. Continuing to walk around the tree, keeping 
on our left, we first meet 3, then + again. We write down the 3, but this time we do not wriS 
down + because we have already included it in the string we are producing. Walking sot|f 
more, we meet *, which we write down. Then we meet 4, then * again, then 5. So, we wfflj 
down 4, then 5. There are no symbols we have not met, so our trip is done. The string we 
have produced is 

+3*45 i 

The second derivation tree when converted into the new notation becomes 

* + 345 J3 


i;t 


/ ■ \ 

//A\ 

/ X + ( N s & ) 

//^\ " 
V 3 / \ 4 I 


This tree-walking method produces a string of the symbols +, *, and number , which 
summarizes the picture of the tree and thus contains the information necessary to interpret 
the expression. This is information that is lacking in our usual representation of arithmetic 
expressions, unless parentheses are inserted. We shall show that these strings are unambigi+ 
ous in that each determines a unique calculation without the need for establishing the hierar¬ 
chical convention of times before plus. These representations are said to be in operator pre* 
fix notation because the operator is written in front of the operands it combines. 

Since S —■► S + S has changed from 


the left-hand tracing changes 3+4 into + 34. 

To evaluate a string of characters in this new notation, we proceed as follows. We read: 
the string from left to right. When we find the first substring of the form 

operator-operand-operand (call this o-o-o for short) 

we replace these three symbols with the one result of the indicated arithmetic calculation. 
We then rescan the string from the left. We continue this process until there is only one num¬ 
ber left, which is the value of the entire original expression. 

In the case of the expression + 3 * 4 5, the first substring we encounter of the form 
operator-operand-operand is * 4 5, so we replace this with the result of the indicated multi¬ 
plication, that is, the number 20. The string is now + 3 20. This itself is in the form o-o-o, 
and we evaluate it by performing the addition. When we replace this with the number 23, we 
see that the process of evaluation is complete. 
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In the case of the expression * + 3 4 5, the first o-o-o substring is + 3 4. This we re- 
|f lace w ith the number 7. The string is then * 7 5, which itself is in the o-o-o form. When we 

lA replace this with 35, the evaluation process is complete. . 

Let us see how this process works on a harder example. Let us start with the arithmetic 

| expression 

((1 +2)* (3 + 4)+ 5) *6 

This is shown in normal notation, which is called operator infix notation because the 
operators are placed in between the operands. With infix notation, we often need to use 
parentheses to avoid ambiguity, as is the case with the expression above. To convert this to 
operator prefix notation, we begin by drawing its derivation tree. 

} i t 

1 //_\>, 

✓ / 'V 6 ; 

/ + / 

«... / .V 




Reading around this tree gives the equivalent prefix notation expression 

*+*+12+3456 

Notice that the operands are in the same order in prefix notation as they were in infix nota¬ 
tion; only the operators are scrambled and all parentheses are deleted. 

To evaluate this string, we see that the first substring of the form operator-operand- 
operand is + 1 2, which we replaced with the number 3. The evaluation continues as fol¬ 
lows; 


*+*3+3456 


* +* 3 7 56 


* + 21 5 6 


First o-o-o Substring 


which is the correct value for the expression with which we started. • 

Because the derivation tree is unambiguous, the prefix notation is also unambiguous and 
does not rely on the tacit understanding of operator hierarchy or on the use of parentheses. 
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This clever parenthesis-free notational scheme was invented by the Polish logician 
Lukasiewicz (1878-1956) and is often called Polish notation. There is a similar ope 
postfix notation, which is also called Polish notation, in which the operation syn. 

(+ » .) come after the operands. This can be derived by tracing around the tree 

the'other side, keeping our right hand on the tree and then reversing the resultant string. B<* 
these methods of notation are useful for computer science. Compilers often convert infix to 
prefix and then to assembler code. *S 


AMBIGUITY 

EXAMPLE 

Let us consider the language generated by the following CFG: 

Prod 1 S—*AB 
Prod 2 A~*a 
Prod 3 B —> b 

There are two different sequences of applications of the productions that generate the wot 
ab. One is Prod 1, Prod 2, Prod 3. The other is Prod 1, Prod 3, Prod 2. 






■ ■ 

IllSfT , 
lllfe 


s => AB => aB => ab 


S => AB => Ab => ab 


However, when we draw the corresponding syntax trees, we see that the two derivations are 
essentially the same: %g 

S « i 

/v ,/\ I 



- ■ 

7 


1 I 


j§ 

9 


This example, then, presents no substantive difficulty because there is no ambiguity^ 
interpretation. When all the possible derivation trees are the same for a given word, then th S J 
word is unambiguous. 


DEFINITION 


A CFG is called ambiguous if for at least one word m the language that it generates the 
are two possible derivations of the word that correspond to different syntax trees. If a C 
not ambiguous, it is called unambiguous. 






EXAMPLE 

Let us reconsider the language PALINDROME, which we saw earlier can be generated b 
the CFG below: 


4% 


S-^*aSa\bSb\a\b \A 


At every stage in the generation of a word by this grammar, the working string contains 


ins onl 
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the one nonterminal S smack dab in the middle. The word grows like a tree from the center 
out. For example, 

. baSab =» babSbab => babbSbbab => babbaSabbab . . . 

When we finally replace 5 by a center letter (or A if the word has no center letter), we have 
completed the production of a palindrome. The word aabaa has only one possible genera¬ 
tion: 

S=>aSa 
=*aaSaa 
=> aabaa 

A\ 

a S a 

/l\ 

a S a 

I 

b 

If any other production were applied at any stage in the derivation, a different word would be 
produced. Every word in PALINDROME has a unique sequence of productions leading to it. 
As we read the first half left to right, an a means use S^aSa, a h means use S^bSb, and 
the middle letter determines the final production. 

We see then that this CFG is unambiguous. ■ 

EXAMPLE 

The language of all nonnull strings of a' s can be defined by a CFG as follows: 

S —> aS | Sa | a 

In this case, the word a 3 can be generated by four different trees: 


5 S S S 



/l l\ /l l\ 

a S S a a S S a 


a a a a 

This CFG is therefore ambiguous. 

However, the same language can also be defined by the CFG 

for which the word a 3 has only one production: 
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This CFG is not ambiguous. 

From this last example, we see that we must be careful to say that it is the CFG that is 
ambiguous, not that the language it generates is itself ambiguous. 

4 THE TOTAL LANGUAGE TREE 

So far in this chapter, we have seen that derivation trees carry with them an additional J 
amount of information that helps resolve ambiguity in cases where interpretation is impor- 1 
tant. Trees can be useful in the study of formal grammars in other ways. 

For example, it is possible to depict the generation of all the words in the language of a|j 
CFG simultaneously in one big (possibly infinite) tree. *3 

DEFINITION | 

For a given CFG, we define a tree with the start symbol S as its root and whose nodes arejg 
working strings of terminals and nonterminals. The descendants of each node are all the pos¬ 
sible results of applying every applicable production to the working string, one at a time. A ] 
string of all terminals is a terminal node in the tree. 'jj 

The resultant tree is called the total language tree of the CFG. ■ * 

EXAMPLE 1 


For the CFG 


the total language tree is 


*aa\bX\ aXX 
*ab\b 


bb aabX abX aXab aXb 

/V /I l\ \\ 


aabab aabb 


abab abb aabab abab 


This total language has only seven different words. Four of its words (abb, aabb , abab 
aabab) have two different possible derivations because they appear as terminal nodes in thi 
total language tree in two different places. However, the words are not generated by two dif 
ferent derivation trees and the grammar is unambiguous. For example. 
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a X X 


/\ \ 

a b b 

■ 

EXAMPLE 

Consider the CFG 

S —* aSb \bS\a 

We have the terminal letters a and b and three possible choices of substitutions for S at any 
stage. The total tree of this language begins 


s 



Here, we have circled the terminal nodes because they are the words in the language generated by 
this CFG. We say “begins” because since the language is infinite, the total language tree is too. 

We have already generated all the words in this language with one, two, or three letters: 

L— {a ba aab bba . . .} 

These trees may get arbitrarily wide as well as infinitely long. ■ 

EXAMPLE 

S-*SAS\b 

A—*ba\b 

Every string with some S 's and some A’s has many possible productions that apply to it, two 
for each S and two for each A: 


s 



SASAS bAS SbaS SbS SASAS SAb 



S AS AS AS bASAS SbaSAS SbSAS SASAS AS ... 








Problems 


EXAMPLE 


Consider this CFG: 


S->X\b 

X-*aX 


The total language tree begins 


1. Consider the CFG 


S^aSlbb 


Prove that this generates the language defined by the regular expression 


2. Consider the CFG 


S^>XYX 

X-+ax\bX\A 


Prove that this generates the language of all strings with a triple b in them, which is the 
language defined by 


3. (i) Consider the CFG 


(a + b)*bbb(a + b)* 


S—*aX 

X^aXlbXlA 


What is the language this CFG generates? 


(ii) Consider the CFG 


> XaXaX 
>aX\bX\ A 


What is the language this CFG generates? 

4. Consider the CFG 

S —> SS | XaXaX \ A 
X^bX |A 

(i) Prove that X can generate any b*. 

(ii) Prove that XaXaX can generate any b*ab*ab*. 

(iii) Prove that S can generate (b*ab*ab*)*. 

(iv) Prove that the language of this CFG is the set of all words in (a + b)* with an even 
number of a’ s with the following exception: We consider the word A to have an 
even number of a’s, as do all words with no a' s, but of the words with no a's only 
A can be generated. 

(v) Show how the difficulty in part (iv) can be alleviated by adding the production 


5. Consider the CFG 


S~+XbaaX\aX 

X->Xa\Xb\A 

What is the language this generates? Find a word in this language that can be generated 

in two substantially different ways. 

6. (i) Consider the CFG for “some English” given in this chapter. Show how these pro¬ 

ductions can generate the sentence 

Itchy the bear hugs jumpy the dog. 

(ii) Change the productions so that an article cannot come between an adjective and its 
noun. 

(iii) Show how in the CFG for “some English” we can generate the sentence 

The the the cat follows cat. 

(iv) Change the productions again so that the same noun cannot have more than one ar¬ 
ticle. 


This has a deep significance that will be important to us shortly. 

Surprisingly, even when the whole language tree is infinite, the language may have ongj 

finitely many words. 


Clearly, the only word in this language is the single letter b.X is a bad mistake; it leads 
to no words, because once a working string has got X, it can never be cured of it. 


aX 

I 

aaX 


PROBLEMS 


255 


CHAPTER 12 Context-Free Grammars 

There are more words in this language, but we have not reached them yet. The word bbb gj 
come up shortly. 

The essence of recursive definition comes into play in an obvious way when some nd 
terminal has a production with a right-side string containing its own name, as in this case: 

X —* (blah)X(blah) 

The total tree for such a language then must be infinite because it contains the branch 

X=» (blah)X(blah) 

==> (blah)(blah)X(blah)(blah) 

=> (blah) 3 X(blah) 3 
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7. Find a CFG for each of the languages defined by the following regular expressions: 

(i) ab* 

(ii) a*b* 

(iii) (baa + abb)* 

8. Find CFGs for the following languages over the alphabet 2 = {a b}: 

(i) All words in which the letter b is never tripled. 

(ii) All words that have exactly two or three b’s. 

(iii) All words that do not have the substring ab. 

(iv) All words that do not have the substring baa. 

(v) All words that have different first and last letters: 




9. Consider the CFG 


{ah ba aab abb baa bba . . .} 


-*AA 

■+AAA 

*bA\Ab\a 


Prove that the language generated by these productions is the set of all words with 
even number of a’s , but not no a’ s. Contrast this grammar with the CFG in Problem 4. 

10. Describe the language generated by the following CFG: 

S^SS 

S-+XXX 

X^aXlXalb 


mm 


h 
> - 


11. Write a CFG to generate the language MOREA of all strings that have more a's than b’s 
(not necessarily only one more, as with the nonterminal A for the language EQUAL, bii| 
any number more a’s than b’s). 3 


f 

r 


MOREA = { a aa aab aba baa aaaa aaab . . .} 

12. Let L be any language. We have already defined the transpose of L to be the language 
all the words in L spelled backward (see Chapter 6, Problem 17). Show that if L is 
context-free language, then the transpose of L is context-free also. 

13. In Chapter 10, Problem 4, we showed that the language 

TRAILING-COUNT - {M lene,h(5) for all .9 in (a + b)*} 
is nonregular. Show however that it is context-free and generated by 

S —» aSa | bSa |A 

14. (i) In response to “Time flies like an arrow,” the tout said, “My watch must be broken.’’ 

How many possible interpretations of this reply are there? 

(ii) Chomsky found three different interpretations for “I had a book stolen.” Explain; 
them. Are their parsing trees different? 

15. Below is a set of words and a set of CFGs. For each word, determine whether the word 
is in the language of each CFG and, if it is, draw a syntax tree to prove it. 

Words CFGs 
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(ii) aaaa 

CFG 2. 

S 

-+aS\bS\a 

(iii) aabb 

(iv) abaa 

CFG 3. 

S 

—*aS [ aSb[X 

(v) abba 


X 

—> aXa | a 

(vi) baaa 
(vii) abab 

CFG 4. 

S 

—> aAS j a 

(viii) bbaa 


A - 

^SbA\SS\ba 

(ix) baab 

CFG 5. 

S< 

-~>aB\bA 


A —> a J aS | bAA 
B —> h j bS | oBB 

16. Show that the following CFGs are ambiguous by finding a word with two distinct syntax 
trees: 

(i) S-*SaSaS\b 

(ii) S —> aSb j Sh j Sa j a 

o . ..rk. ni 


(iii) 

S- 

■* aaS | aaaS | a 

(iv) 

S- 

aS\aSb\X 


X- 

■*Xa\a 

(v) 

S- 

-*AA 


A- 

+AAA \a\bA \Ab 


17. Show that the following CFGs that use A are ambiguous: 

(i) S-^XaX 
X->aX\bX\\ 

(ii) S-n>aSX|A 
X —► aX | a 

(iii) S —> aS | bS | aaS | A 

(iv) Find unambiguous CFGs that generate these three languages. 

(v) For each of these three languages, find an unambiguous grammar that generates ex¬ 
actly the same language except for the word A. Do this by not employing the sym¬ 
bol A in the CFGs at all. 

18. Begin to draw the total language trees for the following CFGs until we can be sure we 
have found all the words in these languages with one, two, three, or four letters. Which 
of these CFGs are ambiguous? 


(i) 

s- 

- > aS\bS\a 

(ii) 

s- 

-*■ aSaS | b 

(iii) 

s - 

aSa | bSb | a 

(iv) 

s- 

-*■ aSb j bX 


x- 

-+bX\b 

(v) 

s- 

-*bA\aB 


A bAA 1 aS | a 


B- 

-* aBB \bS\b 


19. Convert the following infix expressions into Polish notation: 

(i) 1*2*3 

(ii) 1*2 + 3 

(iii) 1 * (2 + 3) 

(iv) 1 * (2 + 3) * 4 

(v) ((1 + 2) * 3) + 4 
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(vi) 1 + (2 * (3 + 4)) | 

(vii) 1 + (2 * 3) + 4 

20. Invent a form of prefix notation for the system of propositional calculus used in this 
chapter that enables us to write all well-formed formulas without the need for parenthe¬ 
ses (and without ambiguity). 



CHAPTER 13 


Grammatical 

Format 








REGULAR GRAMMARS 

Some of the examples of languages we have generated by CFGs have been regular lan¬ 
guages; that is, they are definable by regular expressions. However, we have also seen some 
nonregular languages that can be generated by CFGs (PALINDROME and EQUAL). 

What then is the relationship between regular languages and context-free grammars? 
Several possibilities come to mind: 

1. All possible languages can be generated by CFGs. 

2. All regular languages can be generated by CFGs, and so can some nonregular languages 
but not all possible languages. 

3. Some regular languages can be generated by CFGs and some regular languages cannot 
be generated by CFGs. Some nonregular languages can be generated by CFGs and 
maybe some nonregular languages cannot. 

Of these three possibilities, number 2 is correct. In this chapter, we shall indeed show 
that all regular languages can be generated by CFGs. We leave the construction of a lan¬ 
guage that cannot be generated by any CFG for Chapter 16. 

Before we proceed to prove this, it will be useful for us to introduce the notion of a 
semiword. 

DEFINITION 

For a given CFG, a semiword is a string of terminals (maybe none) concatenated with ex¬ 
actly one nonterminal (on the right). In general, a semiword has the shape 


(terminal)(terminal) 


(terminal)(Nonterminal) 




THEOREM 21 


Given any FA, there is a CFG that generates exactly the language accepted by the FA. In 
other words, all regular languages are context-free languages. 


A 
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(vi) 1 + (2 * (3 + 4)) 

(vii) 1 + (2 * 3) + 4 

20. Invent a form of prefix notation for the system of propositional calculus used in this 
chapter that enables us to write all well-formed formulas without the need for parenthe¬ 
ses (and without ambiguity). 


$ REGULAR GRAMMARS 


Some of the examples of languages we have generated by CFGs have been regular lan¬ 
guages; that is, they are definable by regular expressions. However, we have also seen some 
nonregular languages that can be generated by CFGs (PALINDROME and EQUAL). 

What then is the relationship between regular languages and context-free grammars? 
Several possibilities come to mind; 


1. All possible languages can be generated by CFGs. 

2. All regular languages can be generated by CFGs, and so can some nonregular languages 
but not all possible languages. 

3. Some regular languages can be generated by CFGs and some regular languages cannot 
be generated by CFGs. Some nonregular languages can be generated by CFGs and 
maybe some nonregular languages cannot. 


Of these three possibilities, number 2 is correct. In this chapter, we shall indeed show 
that all regular languages can be generated by CFGs. We leave the construction of a lan¬ 
guage that cannot be generated by any CFG for Chapter 16. 

Before we proceed to prove this, it will be useful for us to introduce the notion of a 
semiword. 


Given any FA, there is a CFG that generates exactly the language accepted by the FA. In 
other words, all regular languages are context-free languages. 
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(terminal)(terminal) . . . (terminal)(Nonterminal) 
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PROOF 

The proof will be by constructive algorithm. We shall show how to start with the FA and ere- ^ 
ate one such CFG. 

Step 1 The nonterminals in the CFG will be all the names of the states in the FA with 
the start state renamed S. 


Step 2 For every edge 


or 



create the production 

X->aY or X^aX 
Do the same for fr-edges. 

Step 3 For every final state X, create the production 

X—» A 

Claim 

This CFG generates exactly the language accepted by the original FA. To prove this claim, 
we must show that (i) every word accepted by the FA can be generated from the CFG and 
(ii) every word generated by the CFG is accepted by the FA. 

Proof of (i) 

Let w be some word, say, abbaa, accepted by the FA; then letter by letter, we can grow the 
path through the FA by a sequence of semipaths, the string read from the input so far fol¬ 
lowed by the name of the state to which the string takes us. The sequence of semipaths looks 
something like this: 



Semipaths 

First start in S. 

5 

Then read an a and go to X. 

aX 

Then read a b and go to Y. 

abY 

Finally read an a and go to F. 

abbaaF 

F is a final state, so accept the word. 



*§ 


This corresponds exactly to a derivation in the CFG of the word w through semiwords: ^ 

Production Derivation 


S~*aX 

X-*bY 


S => aX 
=> ahY 


F-+ A 


► abbaaF 
*abbaa 


In summary, a word w accepted by the FA generates a sequence of step-by-step semi¬ 
paths, each one edge longer than the previous, that corresponds to a derivation of w through 
semiwords identical to the semipaths. Since the word w is accepted by the FA, its semipath 
ends in a final state. In the derivation, this is the same as replacing the last nonterminal of the 
last semiword with A and completing the generation of w. 


EXAMPLE (in the middle of the proof) 

Consider the FA 



The CFG the algorithm tells us to create is 

S^aM 
S-*bS 
M^aF 
M—>bS 
F^>aF 
F->bF 
F—> A 

The word babbaaba is accepted by this FA through this sequence of semipaths: 

5 

bS 

baM 

babS 

babbS 

babbaM 

babbaaF 

babbaabF 

babbaabaF 

babbaaba 


b 



corresponding to the CFG derivation applying, in order, the productions S^>bS, S —> aM, 
M^bS,S->bS y S^>aM,M->aF,F->bF,F^>aF,F^> A. ■ 


Proof of (ii) 

We now show that any word generated from the CFG created by the algorithm is accepted 
when run on the FA. 
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Because all the rules of production are of the form 

Nonterminal —*■ terminal Nonterminal 

there will always be one nonterminal in any working string in any derivation in this CF 
and that nonterminal will be on the extreme right end. Therefore, all derivations in this CF 
are through working strings that.are semi words, exclusively. Each derivation starts with an 
and the sequence of semiwords corresponds to a growing sequence of semipaths through tfi 
FA. We can only end the generation of a word when we turn the final nonterminal into 
but this means that the state the semipath is in is a final state and the word generated is an 
put string accepted by the FA. 


EXAMPLE 



„ ■ 
fc' 
■I 


The language of all words with an even number of a’s (with at least some a' s) can be ac¬ 
cepted by this FA: 



i 


Calling the states S, M, and F as before, we have the following corresponding set of pro¬ 
ductions: 

S-+bS\aM 
M —* bM | aF 

F-+bF\ aM | A * 

We have already seen two CFGs for this language, but this CFG is substantially dif¬ 
ferent. (I 

Theorem 21, on p. 259, was discovered (or perhaps invented) by Noam Chomsky and 
George A. Miller in 1958. They also proved the result below, which seems to be the flip side 
of the coin. 


THEOREM 22 


If all the productions in a given CFG fit one of the two forms: 

Nonterminal ■—* semiword 


Nonterminal —* word 


(where the word may be A), then the language generated by this CFG is regular. 
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PROOF 


We shall prove that the language generated by such a CFG is regular by showing that 
there is a TG that accepts the same language. We shall build this TG by constructive algo¬ 
rithm. 

Let us consider a general CFG in this form: 


N l~* W l N 2 




* H>2^3 A 4] * w 23 

N 2 -+W 3 N 4 . . . 


where the N’s are the nonterminals, the w’s are strings of terminals, and the parts w y /V, are 
the semiwords used in productions. One of these A’s must be S. Let N x = S. 

Draw a small circle for each N and one extra circle labeled +. The circle for S we 
label 




For every production rule of the form 




draw a directed edge from state N x to N, and label it with the word w y 



If N x = A z , the path is a loop. For every production rule of the form 

N 

draw a directed edge from N p to + and label it with the word w q , even if w q = A. 



We have now constructed a transition graph. Any path in this TG from - to + corresponds 
to a word in the language of the TG (by concatenating labels) and simultaneously corre¬ 
sponds to a sequence of productions in the CFG generating the same word. Conversely, 
every production of a word in this CFG: 

S ==> wN => wwN => wwwN =>■•*=» wwwww 

corresponds to a path in this TG from — to +. 

Therefore, the language of this TG is exactly the same as that of the CFG. Therefore, 
the language of the CFG is regular. ■ 













264 CHAPTER 13 Grammatical Format 


We should note that the fact that the productions in some CFGs are all in the required 
format does not guarantee that the grammar generates any words. If the grammar is total! 
discombobulated, the TG that we form from it will be crazy too and may accept no word 
However, if the grammar generates a language of some words, then the TG produced earli 
for it will accept that same language. 


§ 

m 


DEFINITION 

A CFG is called a regular grammar if each of its productions is of one of the two forms 

Nonterminal —* semiword 


Nonterminal —* word 

The two previous proofs imply that all regular languages can be generated by regular 
grammars and all regular grammars generate regular languages. 

We must be very careful not to be carried away by the symmetry of these theorems. De¬ 
spite both theorems, it is still possible that a CFG that is not in the form of a regular gram¬ 
mar can generate a regular language. In fact, we have already seen many examples of this 
very phenomenon. 


EXAMPLE 


Consider the CFG 


>aaS\bbS\ 


This is a regular grammar and so we may apply the algorithm to it. There is only one nonter¬ 
minal, S , so there will be only two states in the TG: — and the mandated +. The only pro¬ 
duction of the form /V,—» w is S —■* A, so there is only one edge into + and that is labeled A. 
The productions S —» aaS and S —» bbS are of the form IV, —* wN v where the ATs are both S. 
Because these are supposed to be made into paths from N { to jV 2 , they become loops from S 
back to S. These two productions will become two loops at —, one labeled aa and one lar 
beled bb. The whole TG is shown below: 



By Kleene’s theorem (see Chapter 7), any language accepted by a TG is regular; there¬ 
fore, the language generated by this CFG (which is the same) is regular. It corresponds to the 
regular expression (aa 4- bb)*. If 


EXAMPLE 


Consider the regular CFG 


>aaS\bbS\abX\baX\A 

>aaX\bbX\abS\baS 
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The algorithm tells us that there will be three states: X, +. Because there is only one 
production of the form 


there is only one edge into +. The TG is 





which we immediately see accepts our old friend, the language EVEN-EVEN. (Do not be 
fooled by the A edge to the + state. It is the same as relabeling the — state ±.) ■ 


EXAMPLE 


Consider the regular CFG 


S —* a A | bB 
A—>aS\a 
B—*bS\b 


The corresponding TG constructed by the algorithm in Theorem 22 (p. 262) is 



The language of this CFG is exactly the same as that of the CFG two examples ago, ex¬ 
cept that it does not include the word A. This language can be defined by the regular expres¬ 
sion (aa + bb) + . B 


KILLING A-PRODUCTIONS 


We have not yet committed ourselves to a definite stand on the social acceptability of 
A-productions, that is, productions of the form 
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where N is any nonterminal. We have employed them, but we do not pay them equal wages. 
These A-productions will make our lives very difficult in the discussions to come, so we 
must ask ourselves, Do we need them at all? 

Any context-free language in which A is a word must have some A-productions in its 
grammar since otherwise we could never derive the word A from S. This statement is obvi¬ 
ous, but it should be given some justification. Mathematically, this is easy: We observe that 
A-productions are the only productions that shorten the working string. If we begin with the 
string S and apply only non-A-productions, we never develop a word of length 0. 

However, there are some grammars that generate languages that do not include the word 
A, but that contain some A-productions anyway. One such CFG is 

S-»aX 
X—* A 

Its language is the single word a. There are other CFGs that generate this same language that 
do not include any A-productions. 

The following theorem, which is the work of Bar-Hillel, Perles, and Shamir, shows that 
A-productions are not necessary in a grammar for a context-free language that does not con¬ 
tain the word A. It proves an even stronger result. 


B 


i 


i 


m 


THEOREM 23 

If L is a context-free language generated by a CFG that includes A-productions, then there is 
a different context-free grammar that has no A-productions that generates either the whole 
language L (if L does not include the word A) or else generates the language of all the words 
in L that are not A. 




m 
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PROOF 

We prove this by providing a constructive algorithm that will convert a CFG that contains 
A-productions into a CFG that does not contain A-productions that still generates the same 
language with the possible exception of the word A. 

Consider the purpose of the production 


If we apply this production to some working string, say, abAbNciB, we get abAbaB. In other 
words, the net result is to delete N from the working string. If N was just destined to be 
deleted, why did we let it get in there in the first place? Just because N will come out does 
not mean we could have avoided putting it in originally. 

Consider the following CFG for EVENPALINDROME (the language of all palindromes 
with an even number of letters): 

5 —* aSa | bSb | A 

In this grammar, we have the following possible derivation: 

S=> aSa 
=> aaSaa 
=» aabSbaa 
=> aabbaa 


% 


Killing A-Productions 


267 


We obviously need the nonterminal S in the production process even though we delete it 
from the derivation when it has served its purpose. 

The following rule seems to take care of using and deleting the nonterminals involved in 
A-productions. 

Proposed Replacement Rule 

If, in a certain CFG, there is a production of the form 

N~* A 

among the set of productions, where N is any nonterminal (even S ), then we can modify the 
grammar by deleting this production and adding the following list of productions in its 
place. 

For all productions of the form 

X—>(blah l) N (blah 2) 

where X is any nonterminal (even S or AO and where (blah 1) and (blah 2) are anything at all 
(even involving N), add the production ” 

X—*(blah l)(blah 2) 

Notice that we do not delete the production X -»(blah l )A'(blah 2), only the production 
N-+ A. 

For all productions that involve more than one N on the right side, add new productions 
that have the same other characters but that have all possible subsets of N's deleted. 

For example, the production 

X —» aNbNa 

makes us add 

X abNa (deleting only the first N) 

X -* aNba (deleting only the second N) 

X aba (deleting both AFs) 

Also, the possible production 

X—*NN 

makes us add 

X~*N (deleting one AO 
X—* A (deleting both A’s) 

Instead of using a production with an N and then dropping the N later to form the word 
w, we simply use the correct form of the production with the appropriate N already dropped 
when generating w. There is then no need to remove N later and so no need for the A-pro- 
duction. This modification of the CFG will produce a new CFG that generates exactly the 
same words as the first grammar with the possible exception of the word A. This is the end 
of the proposed replacement rule. 

Let us see what happens when we apply this replacement rule to the following CFG for 
EVENPALINDROME: 

S^-aSa | bSb | A 

We remove the production S — A and replace it with S-*aa and S-»bb, which are the first 
two productions with the right-side S deleted. 
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The CFG is now 


S~*aSa | bSb \ aa \ bb 

which also generates EVENPALINDROME, except for the word A, which can no longer be j 
derived. 

For example, the following derivation is generated in the old CFG: 


r 


Derivation 

Production Used 

S => aSa 

S~^*aSa 

=> aaSaa 

S~*aSa 

=> aabSbaa 

S—*bSb 

=> aabbaa 

S~> A 

In the new CFG, we can combine the last two steps into one: 

Derivation 

Production Used 

S => aSa 

S~*aSa 

=> aaSaa 

S~*aSa 

=>aabbaa 

S~*bb 




We do not eliminate the entire possibility of using S to form words. 

We can now use this proposed replacement rule to describe an algorithm for eliminating 
all A-productions from a given grammar. 

If a particular CFG has several nonterminals with A-productions, then we replace these 
A-productions one by one following the steps of the proposed replacement rule. As we saw,: 
we will get more productions (new right sides by deleting some N' s) but shorter derivations 
(by combining the steps that formerly employed A-productions). We end up with a CFG that 
generates the exact same language as the original CFG (with the possible exception of the 
word A) but that has no A-productions. 

A little discussion is in order here to establish not only that the new CFG actually does 
generate all the non-A words the old CFG does but that it also generates no new words that; 
the old CFG did not. 

We must observe that the new rules of production added do not lead to the generation of 
any new words that were not capable of being generated from the old CFG. This is because 
the new production has the same affect as the application of two old rules and instead of us¬ 
ing X —■* (new (V-deleted string) we could employ these two steps X —* (old string with N) 
and then iV—► A. 

Before we claim that this constructive algorithm provides the whole proof, we must ask 
whether or not it is finite, ft seems that if we start with some nonterminals N v N 2 , N y which 
have A-productions and we eliminate these A-productions one by one until there are none 
left, nothing can go wrong. Can it? 

What can go wrong is that the proposed replacement rule may create new A-productions 
that cannot themselves be removed without again creating more. For example, in this grammar 


m 


s- 

X 

Y 


we have the A-production 


>a\Xb\ aYa 
*Y | A 
>b I X 


X-»A 



so by the replacement rule we can eliminate this production and put in its place the addi¬ 
tional productions 

S-*b (from S~~* Xb) 
and 

Y—* A (from Y —> X) 

But now we have created a new A-production that was not there before. So, we still 
have the same number of A-productions we started with. If we now use the proposed re¬ 
placement rule to get rid of T—» A, we get 

S —* aa (from S~~* aYa) 
and 

X —» A (from X —> Y) 

But we have now recreated the production X^A. So, we are back with our old A-produc¬ 
tion. In this particular case, the proposed replacement rule will never eliminate all A-produc¬ 
tions even in hundreds of applications. 

Therefore, unfortunately, we do not yet have a proof of this theorem. However, we can 
take some consolation in having created a wonderful illustration of the need for careful 
proofs. Never again will we think that the phrase “and so we see that the algorithm is finite” 
is a silly waste of words. 

Despite the apparent calamity, all is not lost. We can perform an ancient mathematical 
trick and patch up the proof. The trick is to eliminate all the A-productions simultaneously. 


DEFINITION (inside the proof of Theorem 23) 

In a given CFG, we call a nonterminal N nullable if 
There is a production N — * A, or 
There is a derivation that starts at N and leads to A: 

N=> • • • => A ■ 

As we have seen, all nullable nonterminals are dangerous. We now state the careful for¬ 
mulation of the algorithm. 

Modified Replacement Rule 

1. Delete all A-productions. 

2. Add the following productions: For every production 

X —»old string 

add new productions of the form X —* • • ■ , where the right side will account for any 
modification of the old string that can be formed by deleting all possible subsets of nul¬ 
lable nonterminals, except that we do not allow X —> A to be formed even if all the char¬ 
acters in this old string are nullable. 

For example, in the CFG 

| Xb | aYa 

I A 

U 


S^a 

X-^Y 

Y^b 
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we find that X and Y are nullable. So when we delete X —» A, we have to check all produ 
tions that .include X or Y to see what new productions to add: 


Old Productions 
with Nullabtes 


S—>Xb 


Productions Newly 
Formed by the Rule 

Nothing 

Nothing 

Nothing 

S—>b 

S^aa 


The new CFG is 


S—*a | Xb | aYa | b | aa 

X-^Y 

Y^b | X J| 

It has no A-productions but generates the same language. 

This modified replacement rule works the way we thought the first replacement ru’ 
would work, that is, by looking ahead at which nonterminals in the working string will 
eliminated by A-productions and offering alternate substitutions in which the nullables ha 
already been eliminated. 

Before we conclude this proof, we should ask ourselves whether the modified replace¬ 
ment rule is really workable, that is, is it an effective procedure in the sense of our use of 
that term in Chapter 11? To apply the modified replacement rule, we must be able to identify 
all the nullable nonterminals at once. How can we do this if the grammar is complicated? 
For example, in the CFG 

S->Xay | fV | aX | ZYX 

X^Za\bt\t%\Xb 

r-fclJUriA 1 

Z—aX, | YYY 

all the nonterminals are nullable, as we can see from 

S => ZYX => YYYYX => YYYYZZ => YYYYYYYZ => YYYYYYYYYY 
=*•••=* AAAAAAAAAA = A 

The solution to this problem is blue paint (the same shade used in Chapter 11). Let usj 
start by painting all the nonterminals with A-productions blue. We paint every occurrence of 
them, throughout the entire CFG, blue. Now for step 2, we paint blue all nonterminals that 
produce solid blue strings. For example, if 


and Z, Y, and X are all blue, then we paint S blue. Paint all other occurrences of S throughout 
the CFG blue too. As with the FAs, we repeat step 2 until nothing new is painted. At this 
point all nullable nonterminals will be blue. 

This is an effective decision procedure to determine all nullables, and therefore the 
modified replacement rule is also effective. 

This then successfully concludes the proof of this theorem. ■ 
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EXAMPLE 

Let us consider the following CFG for the language defined by (a + b)*a: 

S-*Xa 

X—*ak | bX | A 

The only nullable nonterminal here is X, and the productions that have right sides in¬ 
cluding X are: 

Productions New Productions 

with Nullables Formed by the Rule 

S^Xa S~*a 

X-+aX X-+a 

X-*bX X-^b 

The full new CFG is 

S^Xa | a 

X^aX \bX\a\b 

To produce the word baa , we formerly used the derivation: 

Derivation Production Used 

S^>Xa S-^Xa 

==> bXa X-*bX 

=> baXa X-^aX 

=> baa X —* A 

Now we combine the last two steps, and the new derivation in the new CFG is 

S^>Xa S^Xa 

=> bXa X^bX 

=> baa X —^> a 

Because A was not a word generated by the old CFG, the new CFG generates exactly 
the same language. ■ 


EXAMPLE 


Consider this inefficient CFG for the language defined by (a + b)*bb(a + b)* 
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From X we can derive any word ending in b; from Y we can derive any word starting; 
with b. Therefore, from S we can derive any word with a double b. 

Obviously, A and B are nullable. Based on that, Z—+AB makes Z also nullable. After 
that, we see that W is also nullable. X, Y, and S remain nonnullable. Alternately, of course 5 
we could have arrived at this by azure artistry. 

The modified replacement algorithm tells us to generate new productions to replace the; 
A-productions as follows: 

Additional New Productions 
Old Derived from Old 


X—*Zb 


Z—+AB 

W—>Z 

A—>aA 

A—*bA 

B—*Ba 

B^Bb 


Z—M andZ— 
Nothing new 


r 


r 


Remember, we do not eliminate all of the old productions, only the old A-productions. 


The fully modified new CFG is 


S-+XY 
X^Zb | b 
Y^bW | b 
Z^AB \ A \ B 


stev..- 


A-*aA | M | a | b 
B—*Ba\Bb\a\b 

Because A was not a word generated by the old CFG, the new CFG generates exactl 
the same language. I 

KILLING UNIT PRODUCTIONS 


m 

m 


We now eliminate another needless oddity that plagues some CFGs. 


DEFINITION 


A production of the form 


is called a unit production. 


Nonterminal —* one Nonterminal 


Bar-Hillel, Pedes, and Shamir tell us how to get rid of these too. 


p 

§js 1 ■ 

pR: 

m; 

& 


r 

Up; 

itl 

I 


THEOREM 24 

If there is a CFG for the language L that has no A-productions, then there is also a CFG for 
L with no A-productions and no unit productions. 
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PROOF 


This will be another proof by constructive algorithm. 

First, we ask ourselves what is the purpose of a production of the form 


where A and B are nonterminals. 

We can use it only to change some working string of the form 

(blah)A(blah) 

into the working string 


(blah)#(blah) 


Why would we want to do that? We do it because later we want to apply a production to the 
nonterminal B that is different from any that we could produce from A. For example, 


B —> (string) 


(blah)A(blah) => (blah )B (blah) => (blah)(string)(blah) 

which is a change we could not make without using A —»£, because we had no production 
A —* (string). 

It seems simple then to say that instead of unit productions all we need is A (string). 
We now formulate a replacement rule for eliminating unit productions. 

Proposed Elimination Rule 

If A —>B is a unit production and all the productions starting with B are 

| .V 2 | . . . 

where r,, s v . . .are strings, then we can drop the production A —>B and instead include 
these new productions: 

4-**, I S 2 I ■ ■ ■ 

Again, we ask ourselves, will repeated applications of this proposed elimination rule result 
in a grammar that does not include unit productions but defines exactly the same language? 

The answer is that we still have to be careful. A problem analogous to the one that arose 
before can strike again. 

The set of new productions we create may give us new unit productions. For example, if 
we start with the grammar 

S^A | bb 
A —*B | b 
B —*S | a 

and we try to eliminate the unit production A ~^B, we get instead 

A-*S | a 

to go along with the old productions we are retaining. The CFG is now 
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We still have three unit productions: 


A^S, 


B~*S 


If we now try to eliminate the unit production B^S, we create the new unit producti 
fi—M. If we then use the proposed elimination rule on B — >A, we will get back B—*S. 

As was the case with A-productions, we must get rid of all unit productions in one ft 
swoop to avoid infinite circularity. ga 

Modified Elimination Rule 

For every pair of nonterminals A and B, if the CFG has a unit production A~*B or if there | 
a chain of unit productions leading from A to B, such as 

=> • • • =*B 

where X,, X 2 are some nonterminals, we then introduce new productions according to thp 
following rule: If the nonunit productions from B are 


where s v s v and s 3 are strings, create the productions 

A —*tf] | s 2 | s 3 | . . . | 

We do the same for all such pairs of A 's and B 's simultaneously. We can then eliminate^ 
all unit productions. 


This is what we meant to do originally. If in the derivation for some word w the nonter¬ 
minal A is in the working string and it gets replaced by a unit production A —*2?, or by a se-J 
quence of unit productions leading to B, and further if B is replaced by the production 
B —+ s 4 , we can accomplish the same thing and derive the same word w by employing the 
production A —» s 4 directly in the first place. ^ 

This modified elimination rule avoids circularity by removing all unit productions at 
once. If the grammar contains no A-productions, it is not a hard task to find all sequences of ^ 
unit productions A —► 5, -> S 2 -> • * • -> B, because there are only finitely many unit produc¬ 
tions and they chain up in only obvious ways. In a grammar with A-productions and nullable 
nonterminals X and Y, the production S~*ZYX is essentially a unit production. There are no- 
A-productions allowed by the hypothesis of the theorem so no such difficulty is possible. 

The modified method described in the proof is an effective procedure and it proves the 
theorem. * 

EXAMPLE 

Let us reconsider the troubling example mentioned in the proof above: 

S—*A | bh 

A-^B j h J 

B-^S j a 

Let us separate the units from the nonunits: 

Unit Productions Decent Folks 

S->A S—*bb 

A —*B A—*b 

B-+S B—*a 
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We list all unit productions and sequences of unit productions, one nonterminal at a 
time, tracing each nonterminal through each sequence it heads. Then we create the new pro¬ 
ductions that allow the first nonterminal to be replaced by any of the strings that could re¬ 
place the last nonterminal in the sequence. 

S —»A gives S—*b 

S —* /4 —* £ gives S-+a 

A—+B gives A —* a 

A^>B—*S gives A-+bb 

B^S gives B^bb 

B^>S—*A gives B —b 

The new CFG for this language is 

S-*bb \ b \ a 
A~*h \ a\bb 
B-*a j bb | b 

which had no unit productions. 

Parenthetically, we may remark that this particular CFG generates a finite language 
since there are no nonterminals in any string produced from S. ■ 

CHOMSKY NORMAL FORM 

In our next result, we will separate the terminals from the nonterminals in CFG productions. 

THEOREM 25 

If L is a language generated by some CFG, then there is another CFG that generates all the 
non-A words of L, all of whose productions are of one of two basic forms: 

Nonterminal —> string of only Nonterminals 
Nonterminal —* one terminal 

PROOF 

The proof will be by constructive algorithm. Let us suppose that in the given CFG the non¬ 
terminals are S, X v X v .... 

Let us also assume that the terminals are a and b. 

We now add two new nonterminals A and B and the productions 

A^a 

B—*b 

Now for every previous production involving terminals, we replace each a with the non¬ 
terminal A and each b with the nonterminal B. For example, 

X 3 —>X 4 aX t SbbX 7 a 

becomes 

X^X^AX^SBBXtA 
which is a string of solid nonterminals. 
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Even if we start with a string of solid terminals 


X 6 —* aaha 


we convert it into a string of solid nonterminals 

X^AABA 

All our old productions are now of the form 

Nonterminal —* string of Nonterminals 
and the two new productions are of the form 

Nonterminal —* one terminal 

Any derivation that formerly started with S and proceeded down to the word 

aaabba 

will now follow the same sequence of productions to derive the string 

AAABBA I 

from the start symbol S. From here we apply A-+a and B—*b a number of times to gener¬ 
ate the word aaabba. This convinces us that any word that could be generated by the original 
CFG can also be generated by the new CFG. 

We must also note that any word generated by the new CFG could also be generated bj 
the old CFG. Any derivation in the new CFG is a sequence of applications of those produce 
tions that are modified old productions and the two totally new productions from A and 
Because these two new productions are the replacement of one nonterminal by one terminal 
nothing they introduce into the working string is itself replaceable. They do not interact will 
the other productions. 

If the letters A and B were already nonterminals in the CFG to start with, then any twc 
other unused symbols would serve as well. Therefore, this new CFG proves the theorem. ■ 


EXAMPLE 


Let us start with the CFG 


>X ] | X 2 aX 2 | aSb 


>aX- I aaX, 


After the conversion, we have 


>x 1 ax 2 

*ASB 

>B 


We have not employed the disjunction slash |, but instead have written out all the pr 
ductions separately so that we may observe eight of the form 

Nonterminal —* string of Nonterminals 
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and two of the form 

Nonterminal —* one terminal ■ 

In all cases where the algorithm of the theorem is applied, the new CFG has the same 
number of terminals as the old CFG and more nonterminals (one new one for each terminal). 

As with all our proofs by constructive algorithm, we have not said that this new CFG is 
the best example of a CFG that fits the desired format. We say only that it is one of those that 
satisfy the requirements. 

One problem is that we may create unit productions where none existed before. For ex¬ 
ample, if we follow the algorithm to the letter of the law. 


will become 

X->A 

A—*a 

To avoid this problem, we should add a clause to our algorithm saying that any produc¬ 
tions that we find that are already in one of the desired forms should be left alone: “If it ain’t 
broke, don’t fix it.” Then we do not run the risk of creating unit productions (or A-produc- 
tions for that matter). 


EXAMPLE 

One student thought that it was a waste of effort to introduce a new nonterminal to stand for 
a if the CFG already contained a production of the form nonterminal —* a. Why not simply 
replace all a’s in long mixed strings by this nonterminal? For instance, why cannot 

S^Na 
N^a | b 

become 

S^NN 
N^>a | b 

The answer is that bb is not generated by the first grammar, but it is by the second. The cor¬ 
rect modified form is 

S-+NA 
N^>a | b 

A—+a U 


EXAMPLE 


The CFG 
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(which generates aa*bb* and which is already in the desired format) would, if we m 
lessly attacked it with our algorithm, become 


5- 
X- 
Y- 
X- 
Y- 
A - 
B- 


>XY 

>XX 

>YY 

*A 

>B 

>a 

>b 



which is also in the desired format but has unit productions. When we get rid of th4 
unit productions using the algorithm of Theorem 24 (p. 272), we return to the originjj 
CFG. 

To the true theoretician, this meaningless waste of energy costs nothing. The goal was j! 
prove the existence of an equivalent grammar in the specified format. The virtue here is |j$ 
find the shortest, most understandable, and most elegant proof, not an algorithm with dozen! 
of messy clauses and exceptions. The problem of finding the best such grammar is also 4 
question theoreticians are interested in, but it is not the question presented in Theorem 25 

(p - 275) - ■ Jam 

The purpose of Theorem 25 was to prepare the way for the following format and theo^ 
rem developed by Chomsky. 


DEFINITION 

If a CFG has only productions of the form 

Nonterminal —> string of exactly two Nonterminals 

or of the form 

Nonterminal —> one terminal 
it is said to be in Chomsky Normal Form, or CNF. 

THEOREM 26 

For any context-free language L, the non-A words of L can be generated by a grammar £ 
which all productions are in CNF. 


Let us be careful to realize that any context-free language that does not contain A as 
word has a CFG in CNF that generates exactly it. However, if a CFL contains A, then wh<| 
its CFG is converted by the algorithms above into CNF, the word A drops out of the 1; 
guage, while all other words stay the same. 


PROOF 

The proof will be by constructive algorithm. 

From Theorems 23 and 24 we know that there is a CFG for L (or for all L except A) 
has no A-productions and no unit productions. 








Let us suppose further that we start with a CFG for L that we have made to fit the form 
specified in Theorem 25. Let us suppose its productions are 



s- 

s- 

S- 


>* 1 * 2 * 3 * 8 X l ~+X 3 X 4 X l0 X 4 


>* 3*5 


x t 


>*4*9 


The productions of the form 


Nonterminal —*• one terminal 


we leave alone. We must now make the productions with right sides having many nontermi¬ 
nals into productions with right sides that have only two nonterminals. 

For each production of the form 

Nonterminal —> string of Nonterminals 

we propose the following expansion that involves the introduction of the new nonterminals 
R v R 2 ' ■ • • • The production 


should be replaced by 

where 

and where 






*l->*2*3 


R „ 


* *3*8 


We use these new nonterminals nowhere else in the grammar; they are used solely to 
split this one production into small pieces. If we need to expand more productions, we intro¬ 
duce new /?’s with different subscripts. 

Let us think of this as 


S—►XjO'est,) 
(restj) —+X 2 (rest 2 ) 
(rest 2 ) —> XjXp, 


(where rest,) = X^C 3 X & ) 
(where rest 2 ) = X 3 X g ) 


This trick works just as well if we start with an odd number of nonterminals on the 
right-hand side of the production: 


should be replaced by 


*8 

*4 

*5 

R < 




xji 4 (where R A = X x X x XfX 9 ) 
X y R 5 (where R 5 = X,X 3 X 9 ) 
*,tf 6 (where R 6 = X^X 9 ) 

*3*9 


In this way, we can convert productions with long strings of nonterminals into se¬ 
quences of productions with exactly two nonterminals on the right side. As with the previous 
theorem, we are not finished until we have convinced ourselves that this conversion has not 
altered the language the CFG generates. Any word formerly generated is still generatable by 
virtually the same steps, if we understand that some productions have been expanded into 
several productions that must be executed in sequence. 
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For example, in a derivation where we previously employed the production 
we must now employ the sequence of productions: 


in exactly this order. 

We must also show that with all these additional new nonterminals and productions w 
have not allowed any additional words to be generated. Let us observe that because the non 
terminal R 5 occurs in only the two productions 


R 5~* X A 

any sequence of productions that generates a working string using R 5 must have used 


to get R 5 into the working string, and 




*5-**1*6 


Ir - 


to remove it from the final string. 

This combination has the net effect of a production like 

*4-*W6 

Again, R A could have been introduced into the working string only by one specific produc 
tion. Also, R 6 can be removed only by one specific production. In fact, the net effect of thes 
R' s must be the same as the replacement of X s by X^X^X^X^Xg. Because we use different R 
in the expansion of each production, the new nonterminals (R s) cannot interact to give 
new words. Each is on the right side of only one production and on the left side of only 
production. The net effect must be like that of the original production. 

The new grammar generates the same language as the old grammar and is in the desire 
form. 


Mm 


f 

E_. 


ji|j§jgl 


EXAMPLE 


Let us convert 

S —*aSa | bSb \ a\ b\ aa\bb 4 

(which generates the language PALINDROME except for A) into CNF. This language i 
called NONNULLPALINDROME. 

First, we separate the terminals from the nonterminal as in Theorem 25 (p. 275). 

S-MSA 

S^>BSB 

S^AA 4 


f; 


11 


m 
1. 
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S—+BB 
S —> a 
S-+b 
A—* a 
B—*b 

Notice that we are careful not to introduce the needless unit productions S—*A and 
S—*B. 

Now we introduce the R* s: 

S -*AR l S^AA 

R^SA S—*BB 

S -> BR 2 S—*a 

R 2 ^SB S^b 

A-+a 
B—*b 

This is in CNF, but it is quite a mess. Had we not seen how it was constructed, we 
would have some difficulty recognizing this grammar as a CFG for NONNULLPALIN¬ 
DROME. 

If we include with this list of productions the additional production 5 —> A, we have a 
CFG for the entire language PALINDROME. 

In languages without the word A, this procedure works smoothly. However, A is a word 
in PALINDROME, and adding the production S —> A will incorporate this word without in¬ 
troducing any other (unwanted) words. ■ 


EXAMPLE 

Let us convert the CFG 

S—>bA | aB 
A—*bAA | aS | a 
B^aBB \bS\b 

into CNF. Because we already use the symbols A and B in this grammar, let us call the new 
nonterminals we need to incorporate to achieve the form of Theorem 25 X (for a) and Y 
(for b). 

The grammar becomes 

S—*YA B-+XBB 
S-+XB B-+YS 

A YAA B—*b 

A —>XS X-+a 

A^a Y^b 

Notice that we have left well enough alone in two instances: 

A—>a and B-+b 
We need to simplify only two productions: 


A —* YAA becomes 


i 
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B—*XBB becomes 


b-*xr 2 

r 2 ~*bb 


The CFG has now become 


•YA I XB 


which is in CNF. This is one of the more obscure grammars for the language EQUAL. 


EXAMPLE 


Consider the CFG 


S~+aaaaS | aaaa 


We convert 


which generates the language a 4n for n = 1, 2, 3, . . . = [a 4 a B a 12 
this to CNF as follows: first into the form of Theorem 25 

S-+AAAAS 

S-+AAAA 

A-^a 

which in turn becomes 


/?! * A /?2 
r 2 -+ar 3 
r 3 -+as 
s —*ar 4 

R 4 ~+AR 5 
r 5 -+aa 
A — 


LEFTMOST DERIVATIONS 

As the last topic in this chapter, we show that we can not only standardize the form of the 
grammar, but also the form of the derivations. 


DEFINITION 

The leftmost nonterminal in a working string is the first nonterminal that we encounter 
when we scan the string from left to right. ■ 
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EXAMPLE 


In the string abNbaXYa, the leftmost nonterminal is N, 


DEFINITION 


If a word w is generated by a CFG by a certain derivation and at each step in the derivation, 
a rule of production is applied to the leftmost nonterminal in the working string; then this 
derivation is called a leftmost derivation. ■ 


EXAMPLE 


Consider the CFG 


S-+aSX | b 
X~+Xb i a 


The following is a leftmost derivation: 

S=>aSX 
=> aaSXX 
== ^ aabXX 
=> aabXbX 
==> aababX 
=>aababa 

At every stage in the derivation, the nonterminal replaced is the leftmost one. 


EXAMPLE 

Consider the CFG 

X-+XX | a 
Y-^YY | b 

We can generate the word aaabb through several different production sequences, each 
of which follows one of these two possible derivation trees: 


X ^ 

I / \ I 


/\ / 

x x y 

/\ I I 


a a b 


a a a b 
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Each of these trees becomes a leftmost derivation when we specify in what order the s 
are to be taken. If we draw a dotted line similar to the one that traces the Eukesiewicz 
tion for us, we see that it indicates the order of productions in the leftmost derivation, 
number the nonterminals in the order in which we first meet them on the dotted line. Thi 
the order in which they must be replaced in a leftmost derivation. 


/« 


/ x i 


7 Y \ 


7 /y n\\ /// \V 

3 X i 4 X \ 8 Y j 9 Y 
'll ' / v ' t i I : i 


//A' 
/ / / \ \ 
5X1 6 X 


a j v fl , ; » ft , 


Derivation li 


/*( v\\ 

3 X \ 6 X j 8 V ! 9 y I 

/ / \ \ I I ' I 1 ill 


!A\\ 

4 X 1 5 X I 


\ a j { a I \ a ) \ b j 


Derivation I 


Derivation II 


1. S=*XY 


1. S=>XY 


8. =>aaabY 8. =>aaabY 

9. ==»aaabb 9. —■>aaabb 

In each of these derivations, we have drawn a dot over the head of the leftmost nonte 
nal. It is the one that must be replaced in the next step if we are to have a leftmost derivation. 

The method illustrated above can be applied to any derivation in any CFG. It theref 
provides a proof by constructive algorithm for the following theorem. 


THEOREM 27 

Any word that can be generated by a given CFG by some derivation also has a leftmost d 


EXAMPLE 


Consider the CFG 
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To generate the symbolic logic formula 


we use the following tree: 


(pD(~pDq)) 


( S ) 


S D S 


p ( S ) 


S D S 


? q 




Remember that the terminal symbols are ( ) D ~p q. Because the only nonterminal is S, we 
must always replace the leftmost S : 

5=>(S) 

=>(SDS) 

=>(pDS) 

=>(pD(5)) 

=>(pD(SDS)) 

=>(/?D(~5DS)) 

^(pD(~/OS)) 

=>(pD(~pDq)) u 


# PROBLEMS 


1. Find CFGs that generate these regular languages over the alphabet X = {a b\: 

(i) The language defined by (aaa + b)*. 

(ii) The language defined by (a 4- b)*(bbb + aaa)(a + b)*. 

(iii) All strings without the substring aaa . 

(iv) All strings that end in b and have an even number of /?’s in total. 

(v) The set of all strings of odd length. 

(vi) All strings with exactly one a or exactly one b. 

(vii) All strings with an odd number of a ’s or an even number of b' s. 

2. For the seven languages of Problem 1, find CFGs for them that are in regular grammar 
format. 

For the following CFGs, find regular expressions that define the same language and de¬ 
scribe the language. 
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3. (i) S-^aX \ bS\a\b 

X-^aX j a 

(ii) S~*bS \aX\b 
X~*bX \aS\a 

4. (i) S—*aaS | abS | baS | bbS | A 
(ii) S-»aB | bA | A 

A^aS 

B—*bS 


5. (i) S- 
A - 
B- 
(ii) S- 
X- 
Y- 


►aB ] bA 
>aB | a 
*bA j b 
*aS \bX\ a 
»aX \bY\a 
>aY I a 


6. (i) S-+aS | bX \ a fl 

X-*aX \bY\bZ\a 
Y—*aY | a 
Z—*aZ | hW 

W-*aW\a L 

(ii) S —* bS | aX Jjj 

X-*bS | aY 
Y-^aY \bY\a\b 

7. (i) Starting with the alphabet p 

2= {a b ( ) + *} 

find a CFG that generates all regular expressions. js 

(ii) Is this language regular? J 

8. Despite the fact that a CFG is not in regular form, it still might generate a regular lan¬ 
guage. If so, this means that there is another CFG that defines the same language and 

in regular form. For each of the examples below, find a regular form version of the CFG| 

(i) S-+XYZ 

X—*aX \bX \ A 
Y—*aY \bY \ A 


X- 

Y- 

(iii) S~ 
X- 
Y- 


>aX | bX 
>aY | bY | 
>aZ j A 

>xxx 

*aX j a 
>bY | b 
>XY 

*aX I Xa 


9. Show how to convert a TG into a regular grammar without first converting it to an FA. 

10. Let us, for the purposes of this problem only, allow a production of the form 


where A, and N 2 are nonterminals and r is a regular expression. The meaning of this fori 
mula is that in any working string we may substitute for N , any string wN 2 , where w is a 
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word in the language defined by r. This can be considered a short hand way of writing 
an infinite family of productions, one for each word in the language of r. 

Let a grammar be called bad if all its productions are of the two forms 

N l —* rN 2 
A 

Bad grammars generate languages the same way CFGs do. 

Prove that even a bad grammar cannot generate a nonregular language, by showing 
how to construct one regular expression that defines the same language as the whole bad 
grammar. 

11. Each of the following CFGs has a production using the symbol A and yet A is not a 
word in its language. Using the algorithm in this chapter, show that there are other CFGs 
for these languages that do not use A-productions: 

(i) S~*aX | bX 
X-*a \b\A 

(ii) S~*aX \ bS\a\b 
X-*aX | a | A 

(iii) S~+aS | bX 
X~+aX | A 

(iv) S~+XaX | bX 
X-+XaX | XbX | A 

12. (i) Show that if a CFG does not have A-productions, then there is another CFG that 

does have A-productions and generates the same language. 

(ii) Show that if a CFG does not have unit productions, then there is another CFG that 
does have unit productions and generates the same language. 

13. Each of the following CFGs has unit productions. Using the algorithm presented 
in this chapter, find CFGs for these same languages that do not have unit produc¬ 
tions. 

(i) S~+aX | Yb 

x-+s 

Y-+bY | b 

(ii) S—*AA 
A-+B | BB 
B-^abB \b\bb 

(iii) S-+AB 
A~*B 

B-+aB | Bb | A 

14. Convert the following CFGs to CNF: 

(i) S-^SS | a 

(ii) S~+aSa | SSa | a 

(iii) S—*aXX 
X-*aS \bS\a 

(iv) E-+E + E 

£-*(£) 

E—*l 

The terminals here are + * ( ) 7. 

(v) S-+ABABAB 
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A—>a | A 
B-^b j A 

Note that A is a word in this language, but when converted into CNF, the gram 
will no longer generate it. 

(vi) S^SaS | SaSbS | SbSaS | A 

(vii) S^AS | SB 
A—*BS | SA 
B-+SS 

15. Convert the following CFGs with unit productions into CNF: 

(i) S^X 


■ 


m 


(ii) S^SS I A 
A—>SS | AS | a 

16. If L is a CFL that contains the word A and we Chomsky-ize its CFG into CNF and th 
add on the sole extra production S —* A, do we now generate all of L and only LI 

17. (i) Find the leftmost derivation for the word abba in the grammar 


(ii) Find the leftmost derivation for the word abbabaabbbabbab in the CFG 

S^SSS | aXb 
X^ba | bba | abb 

18. Given a CFG in CNF and restricting all derivations of words to being leftmost deriva 
tions, is it still possible that some word w has two nonidentical derivation trees? In othe 
words, is it still possible that the grammar is ambiguous? 

19. Prove that any word that can be generated by a CFG has a rightmost derivation. 

20. Show that if L is any contex-free language that does not contain the word A, then ther 
is a context-free grammar that generates L and has the property that the right-hand sid 
of every production is a string that starts with a terminal. In other words, all production 
are of the form 


& 


Nonterminal —*• terminal(arbitrary) 
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Automata 


NEW FORMAT FOR FAs 

In Chapter 13, we saw that the class of languages generated by CFGs is properly larger than 
the class of languages defined by regular expressions. This means that all regular languages 
can be generated by CFGs, and so can some nonregular languages (e.g., {a n b n ) and PALIN¬ 
DROME). 

After introducing the regular languages defined by regular expressions, we found a class 
of abstract machines (FAs) with the following dual property: For each regular language, 
there is at least one machine that runs successfully only on the input strings from that lan¬ 
guage and for each machine in the class, the set of words it accepts is a regular language. 
This correspondence was crucial to our deeper understanding of this collection of languages. 
The pumping lemma, complements, intersection, decidability, and so on were all learned 
from the machine aspect, not from the regular expression. We are now considering a differ¬ 
ent class of languages but we want to answer the same questions, so we would again like to 
find a machine formulation. We are looking for a mathematical model of some class of ma¬ 
chines that correspond analogously to CFLs; that is, there should be at least one machine 
that accepts each CFL and the language accepted by each machine is context-free. We want 
CFL-recognizers or CFL-acceptors just as FAs are regular language-recognizers and -accep¬ 
tors. We are hopeful that an analysis of the machines will help us understand the class of 
context-free languages in a deeper, more profound sense, just as an analysis of FAs led to 
theorems about regular languages. In this chapter, we develop such a new type of machine. 
In the next chapter, we prove that these new machines do indeed correspond to CFLs in the 
way we desire. In subsequent chapters, we shall learn that the grammars have as much to 
teach us about the machines as the machines do about the grammars. 

To build these new machines, we start with our old FAs and throw in some new gadgets 
that will augment them and make them more powerful. Such an approach does not necessar¬ 
ily always work—a completely different design may be required—but this time it will (it is 
a stacked deck). 

What we shall do first is develop a slightly different pictorial representation for FAs, one 
that will be easy to augment with the new gizmos. 

We have, so far, not given a name to the part of the FA where the input string lives while 
it is being run. Let us call this the INPUT TAPE. The INPUT TAPE must be long enough 
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(vi) 

(vii) 


A~*a | A 

B-+b | A ; J 

Note that A is a word in this language, but when converted into CNF, the grammar 

will no longer generate it. j 

S~+SaS | SaShS \ SbSaS | A 

S-+AS | SB 

A-^BS | SA 

B-+SS 


15. Convert the following CFGs with unit productions into CNF: 1 

(i) S-+X 
X^Y 
Y-+Z 
Z—>aa 

(ii) S^SS | 4 
A-+SS \AS\a 

16. If L is a CFL that contains the word A and we Chomsky-ize its CFG into CNF and then 
add on the sole extra production S —* A, do we now generate all of L and only L? 

17. (i) Find the leftmost derivation for the word abba in the grammar 

S-+AA 
A—*ciB 
B-+bB | A 

(ii) Find the leftmost derivation for the word abbabaabbbabbab in the CFG 


S- 

X- 


>SSS I aXb 
>ba I bba I abb 


18. Given a CFG in CNF and restricting all derivations of words to being leftmost deriva¬ 
tions, is it still possible that some word w has two nonidentical derivation trees? In other; 
words, is it still possible that the grammar is ambiguous? 

19. Prove that any word that can be generated by a CFG has a rightmost derivation. 

20. Show that if L is any contex-free language that does not contain the word A, then there ;: 
is a context-free grammar that generates L and has the property that the right-hand side 
of every production is a string that starts with a terminal. In other words, all productions: 
are of the form 


Nonterminal —»terminal(arbitrary) 
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NEW FORMAT FOR FAs 

In Chapter 13, we saw that the class of languages generated by CFGs is properly larger than 
the class of languages defined by regular expressions. This means that all regular languages 
can be generated by CFGs, and so can some nonregular languages (e.g., { a n b n ) and PALIN¬ 
DROME). 

After introducing the regular languages defined by regular expressions, we found a class 
of abstract machines (FAs) with the following dual property: For each regular language, 
there is at least one machine that runs successfully only on the input strings from that lan¬ 
guage and for each machine in the class, the set of words it accepts is a regular language. 
This correspondence was crucial to our deeper understanding of this collection of languages. 
The pumping lemma, complements, intersection, decidability, and so on were all learned 
from the machine aspect, not from the regular expression. We are now considering a differ¬ 
ent class of languages but we want to answer the same questions, so we would again like to 
find a machine formulation. We are looking for a mathematical model of some class of ma¬ 
chines that correspond analogously to CFLs; that is, there should be at least one machine 
that accepts each CFL and the language accepted by each machine is context-free. We want 
CFL-recognizers or CFL-acceptors just as FAs are regular language-recognizers and -accep¬ 
tors. We are hopeful that an analysis of the machines will help us understand the class of 
context-free languages in a deeper, more profound sense, just as an analysis of FAs led to 
theorems about regular languages. In this chapter, we develop such a new type of machine. 
In the next chapter, we prove that these new machines do indeed correspond to CFLs in the 
way we desire. In subsequent chapters, we shall leam that the grammars have as much to 
teach us about the machines as the machines do about the grammars. 

To build these new machines, we start with our old FAs and throw in some new gadgets 
that will augment them and make them more powerful. Such an approach does not necessar¬ 
ily always work—a completely different design may be required—but this time it will (it is 
a stacked deck). 

What we shall do first is develop a slightly different pictorial representation for FAs, one 
that will be easy to augment with the new gizmos. 

We have, so far, not given a name to the part of the FA where the input string lives while 
it is being run. Let us call this the INPUT TAPE. The INPUT TAPE must be long enough 
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for any possible input, and because any word in a* is a possible input, the TAPE must be in^ 
finitely long (such a tape is very expensive). The TAPE has a first location for the first lett 
of the input, then a second location, and so on. Therefore, we say that the TAPE is infinite 
one direction only. Some people use the silly term “half-infinite” for this condition (which i 
like being half sober). 

We draw the TAPE as shown here: 




The locations into which we put the input letters are called cells. We name the cells wit 
lowercase Roman numerals: 




Below we show an example of an input TAPE already loaded with the input string aaba. 
The character A is used to indicate a blank in a TAPE cell. 




The vast majority (all but four) of the cells on the input TAPE are empty; that is, they 
are loaded with blanks, AAA .... 

As we process this TAPE on the machine, we read one letter at a time and eliminate 
each as it is used. When we reach the first blank cell, we stop. We always presume that oncti 
the first blank is encountered, the rest of the TAPE is also blank. We read from left to right 
and never go back to a cell that was read before. 

As part of our new pictorial representations for FAs, let us introduce the symbols 




to streamline the design of the machine. The arrows (directed edges) into or out of these 
states can be drawn at any angle. The START state is like a - state connected to another 
state in a TG by a A-edge. We begin the process there, but we read no input letter. We just 
proceed immediately to the next state. A start state has no arrows coming into it. 

An ACCEPT state is a shorthand notation for a dead-end final state—once entered, it 
cannot be left, such as 



f 

m 


A REJECT state is a dead-end state that is not final: 


all Letters 
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Because we have used the adjective “final” to apply only to accepting states in FAs, we call 
the new ACCEPT and REJECT states “halt states.” Previously, we could pass through a final 
state if we were not finished reading the input data; halt states cannot be traversed. 

We are changing our diagrams of FAs so that every function a state performs is done by 
a separate box in the picture. The most important job performed by a state in an FA is to read 
an input letter and branch to other states depending on what letter has been read. To do this 
job from now on, we introduce the READ states. These are depicted as diamond-shaped 
boxes as shown below: 

(follow this path if what is 


(follow this path if what is 
read is a b) 


(follow this path if a A was read, i.e., if the 
input string was empty or totally read) 

Here again, the directions of the edges in the picture above show only one of the many 
possibilities. When the character A is read from the TAPE, it means that we are out of input 
letters. We are then finished processing the input string. The A-edge will lead to ACCEPT if 
the state we have stopped in is a final state and to REJECT if the processing stops in a state 
that is not a final state. In our old pictures for FAs, we never explained how we knew we 
were out of input letters. In these new pictures, we can recognize this fact by reading a blank 
from the TAPE. 

These suggestions have not altered the power of our machines. We have merely intro¬ 
duced a new pictorial representation that will not alter their language-accepting abilities. 

The FA that used to be drawn like 



b 


(the FA that accepts all words ending in the letter a) becomes, in the new symbolism, the 
machine below: 
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Notice that the edge from START needs no label because START reads no letter. All the 
other edges do require labels. We have drawn the edges as straight-line segments, not curves 
and loops as before. We have also used the electronic diagram notation for wires flowing 
into each other. For example. 


means 


Our machine is still an FA. The edges labeled A are not to be confused with A-labeled 
edges. The A-edges lead only from READ boxes to halt states. We have just moved the + 
and - signs out of the circles that used to indicate properties of states and into adjoining 
ovals. The “states” are now only READ boxes and have no final/nonfinal status. 

In the FA above, if we run out of input letters in the left READ state, we will find a A on 
the INPUT TAPE and so take the A-edge to REJECT. Reading a A in a READ state that cor¬ 
responds to an FA final state, like the READ on the right, sends us to ACCEPT. 

Let us give another example of the new pictorial notation. 


EXAMPLE 


becomes 


ACCEPT 
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These pictures look more like the “flowcharts” we are familiar with than the old pictures 
for FAs did. The READ states are diamond-shaped because they are conditional branch in¬ 
structions. The general study of the flowchart as a mathematical structure is part of computer 
theory, but beyond our intended scope. 


ADDING A PUSHDOWN STACK 

The reason we bothered to construct new pictures for FAs (which had perfectly good pic¬ 
tures already) is that it is now easier to make an addition to our machine called the PUSH¬ 
DOWN STACK, or PUSHDOWN STORE. This is a concept we may have already met in a 
course on data structures. 

A PUSHDOWN STACK is a place where input letters (or other information) can be 
stored until we want to refer to them again. It holds the letters it has been fed in a long col¬ 
umn (as many letters as we want). The operation PUSH adds a new letter to the top of the 
column. The new letter is placed on top of the STACK, and all the other letters are pushed 
back (or down) accordingly. Before the machine begins to process an input string, the 
STACK is presumed to be empty, which means that every storage location in it initially 
contains a blank. If the STACK is then fed the letters a, b , c, d by the sequence of instruc¬ 
tions 

PUSH a 
PUSH b 
PUSH c 
PUSH d 

then the top letter in the STACK is d , the second is c, the third is b, and the fourth is a. If we 
now execute the instruction 

PUSH b 

the letter b will be added to the STACK on the top. The d will be pushed down to position 2, 
the c to position 3, the other b to position 4, and the bottom a to position 5. 

One pictorial representation of a STACK with these letters in it is shown below. Beneath 
the bottom a , we presume that the rest of the STACK, which, like the INPUT TAPE, has in¬ 
finitely many storage locations, holds only blanks. 

STACK 
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The instruction to take a letter out of the STACK is called POP. This causes the letter on 
the top of the STACK to be brought out of the STACK (popped). The rest of the letters are 
moved up one location each. A PUSHDOWN STACK is called a LIFO file, which stands 
for “the last in is the first out,” like a narrow crowded elevator. It is not like the normal stor¬ 
age area of a computer, which allows random access (we can retrieve stuff from anywhere 
regardless of the order in which it was fed). A PUSHDOWN STACK lets us read only the 
top letter. If we want to read the third letter in the STACK, we must go POP, POP, POP, but 
then we have additionally popped out the first two letters and they are no longer in the 
STACK. We also have no simple instruction for determining the bottom letter in the STACK, 
for telling how many b 's are in the STACK, and so forth. The only STACK operations al¬ 
lowed to us are PUSH and POP. 

Popping an empty STACK, like reading an empty TAPE, gives us the blank character A. 

We can add a PUSHDOWN STACK and the operations PUSH and POP to our new 
drawings of FAs by including as many as we want of the states 



and 


PUSH a 


PUSH b 





The edges coming out of a POP state are labeled in the same way as the edges from a 
READ state, one (for the moment) for each character that might appear in the STACK in¬ 
cluding the blank. Note that branching can occur at POP states but not at PUSH states. We 
can leave PUSH states only by the one indicated route, although we can enter a PUSH state 
from any direction. 

When FAs have been souped up with a STACK and POP and PUSH states, we call them 
pushdown automata, abbreviated PDAs. These PDAs were introduced by Anthony G. Oet- 
tinger in 1961 and Marcel P. Schiitzenberger in 1963 and were further studied by Robert J. 
Evey, also in 1963. 

The notion of a PUSHDOWN STACK as a data structure had been around for a 
while, but these mathematicians independently realized that when this memory structure 
is incorporated into an FA, its language-recognizing capabilities are increased consider¬ 
ably. 

The precise definition will follow soon, after a few examples. 
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Consider the following PDA 


START 


PUSH 


Before we begin to analyze this machine in general, let us see it in operation on the in 
put string aaabbb. We begin by assuming that this string has been put on the TAPE. We al 
ways start the operation of the PDA with the STACK empty as shown: 


STACK 


We must begin at START. From there we proceed directly into the upper left READ, a 
state that reads the first letter of input. This is an a, so we cross it off the TAPE (it has been 
read) and we proceed along the a-edge from the READ state. This edge brings us to the 
PUSH a-state that tells us to push an a onto the STACK. Now the TAPE and STACK look 
like this: 


TAPE 


STACK 


^ READ — 

A 

A 
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The edge from the PUSH a-box takes us back to the line feeding into the same READ bo 
so we return to this state. We now read another a and proceed as before along the «-edge to pu 
it into the STACK. Again, we are returned to the READ box. Again, we read an a (our thir 
and, again, this a is pushed onto the STACK. The TAPE and STACK now look like this: 



STACK 



After the third PUSH a, we are routed back to the same READ state again. This tim 
however, we read the letter 6. This means that we take the 6-edge out of this state down i 
the lower left POP. Reading the 6 leaves the TAPE like this: 


4 

4 

4 

$ 

b 


The state POP takes the top element off the STACK. It is an a. It must be an a or a A b 
cause the only letters pushed onto the STACK in the whole program are a' s. If it were a A 
the impossible choice, b , we would have to go to the REJECT state. However, this tim 
when we pop the STACK, we get the letter a out, leaving the STACK like this: 


STACK 




Following the a-road from POP takes us to the other READ. The next letter on the 
TAPE to be read is a b. This leaves the TAPE like this: 


The 6-road from the second READ state now takes us back to the edge feeding into the POP 
state. So, we pop the STACK again and get another a. The STACK is now down to only one a: 


STACK 
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The ar-line from POP takes us again to this same READ. There is only one letter left on 
the input TAPE, a 6. We read it and leave the TAPE empty, that is, all blanks. However, the 
machine does not yet know that the TAPE is empty. It will discover this only when it next 
tries to read the TAPE and finds a A: 

TAPE 

The 6 that we just read loops us back into the POP state. We then take the last a from 
the STACK, leaving it also empty—all blanks: 

STACK 

A 


The a takes us from POP to the right-side READ again. This time the only thing we can 
read from the TAPE is a blank, A. The A-edge takes us to the other POP on the right side. 
This POP now asks us to take a letter from the STACK, but the STACK is empty. Therefore, 
we say that we pop a A. 

This means that we must follow the A-edge, which leads straight to the halt state 
ACCEPT. Therefore, the word aaabbb is accepted by this machine. 

More than this can be observed. The language of words accepted by this machine is exactly 

{a n b n , n = 0 1 2 . . .} 

Let us see why. 

The first part of the machine, 




is a circuit of states that reads from the TAPE some number of a’s in a row and pushes them 
into the STACK. This is the only place in the machine where anything is pushed into the 
STACK. Once we leave this circuit, we cannot return, and the STACK contains everything it 
will ever contain. 

After we have loaded the STACK with all the a ’s from the front end of the input string, 
we read yet another letter from the input TAPE. If this character is a A, it means that the in¬ 
put word was of the form a n , where n might have been 0 (i.e., some word in a*). 

If this is the input, we take the A-line all the way to the right-side POP state. This tests the 
STACK to see whether or not it has anything in it. If it has, we go to REJECT. If the STACK is 
empty at this point, the input string must have been the null word, A, which we accept. 

Let us now consider the other logical possibility, that after loading the front a’s from the 
input (whether there are many or none) onto the STACK, we read a 6. This must be the first 
6 in the input string. It takes us to a new section of the machine into another small circuit. 
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On reading this first b, we immediately pop the STACK. The STACK can conlain 
some a’s or only A’s. If the input string started with a b , we would be popping the 
STACK without ever having pushed anything onto it. We would then pop a A and go to 
REJECT. If we pop a b , something impossible has happened. So, we go to REJECT and 
call the repairperson. If we pop an a , we go to the lower right READ state that asks us to 

read a new letter. ■ 

As long as we keep popping a’s from the STACK to match the b s we are reading from 
the TAPE, we circle between these two states happily: POP a, 0AD b, POP a, READ b. If 
we pop a A from the STACK, it means that we ran out of STACK a’s before the TAPE ran 
out of input />’s. This A-edge brings us to REJECT. Because we entered this two-state circuit 
by reading a b from the TAPE before popping any a’s, if the input is a word of the form a n b\ 
then the b 's will run out first. 

If while looping around this circuit, we hit an a on the TAPE, the READ state sends us 
to REJECT because this means the input is of the form 

(some a’s) (some b’s) (another a) . . . 

We cannot accept any word in which we come to an a after having read the first h. To 
get to ACCEPT, the second READ state must read a blank and send us to the second POP 
state. Reading this blank means that the word ends after its clump of W s. All the words ac¬ 
cepted by this machine must therefore be of the form a*b* but, as we shall now see, only 
some of these words successfully reach the halt state ACCEPT. 

Eventually, the TAPE will run out of letters and the READ state will turn up a blank. An 
input word of the form a n b n puts n a’s into the STACK. The first b read then takes us to the 
second circuit. After n trips around this circuit, we have popped the last a from the STACK 
and have read the other (» - 1) b ’s and a blank from the TAPE. We then exit this section to 
go to the last test. 

We have exhausted the TAPE’S supply of h’s, so we should check to see 



that the STACK is empty. We want to be sure we pop a A; otherwise, we reject the word be¬ 
cause there must have been more a’s in the front than b’s in the back. For us to get to 
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ACCEPT, both TAPE and STACK must empty together. Therefore, the set of words this 
PDA accepts is exactly the language 

{a n b\ n = 0 1 2 3 . . .} ■ 

In the example above, we said that an a was read and then it was pushed onto the 
STACK. In reality (such as it is), the a that was read was consumed by traversing the a-edge. 
What was pushed was an unrelated a. PUSH states create matter out of thin air; they are not 
limited to what is read from the TAPE. 

We have already shown that the language accepted by the PDA above could not be ac¬ 
cepted by any FA, so pushdown automata are more powerful than finite automata. We can 
say more powerful because all regular languages can be accepted by some PDA because they 
can be accepted by some FA and an FA (in the new notation) is exactly like a PDA that never 
uses its STACK. Propriety dictates that we not present the formal proof of this fact until after 
we give the formal definition of the terms involved. We soon present the definition of PDAs 
(p. 307). 

Let us take a moment to consider what makes these machines more powerful than FAs. 
The reason is that even though they too have only finitely many states to roam among, they 
do have an unlimited capacity for memory. It is a memory with restricted access but memory 
nonetheless. They can know where they have been and how often. The reason no FA could 
accept the language {a n b n } was that for large enough n, the a” part had to run around in a cir¬ 
cuit and the machine could not keep track of how many times it had looped around. It could 
therefore not distinguish between a n b n and some a m b n . However, the PDA has a primitive 
memory unit. It can keep track of how many a’s are read at the beginning. 

Is this mathematical model then as powerful as a whole computer? Not quite, but that 
goal will be reached eventually. 

There are two points we must discuss. The first is that we need not restrict ourselves to 
using the same alphabet for input strings as we use for the STACK. In the example above, 
we could have read an a from the TAPE and then pushed an X into the STACK and let the 
X’s count the number of a’s. In this case, when we test the STACK with a POP state, we 
branch on X or A. The machine would then look like this: 



We have drawn this version of the PDA with some minor variations of display but no 
substantive change in function. 
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EXAMPLE 

Let us introduce the language PALINDROMEX of all words of the form 

s X reverse(s) 

where 5 is any string in (a + b)*. The words in this language are 

{X aXa bXb aaXaa abXba baXab bbXbb aaaXaaa aabXbaa .} 

All these words are palindromes in that they read the same forward and backward. They 
all contain exactly one X, and this X marks the middle of the word. We can build a determin¬ 
istic PDA that accepts the language PALINDROMEX. Suiprisingly, it has the same basic 
structure as the PDA we had for the language \a n b n \. 

In the first part of the machine, the STACK is loaded with the letters from the input 
string just as the initial a's from a n b n were pushed onto the STACK. Conveniently for us, the 
letters go into the STACK first letter on the bottom, second letter on top of it, and so on until 
the last letter pushed in ends up on top. When we read the X, we know we have reached the 
middle of the input. We can then begin to compare the front half of the word (which is re¬ 
versed in the STACK) with the back half (still on the TAPE) to see that they match. 

We begin by storing the front half of the input string in the STACK with this part of the 
machine: 



If we READ an a, we PUSH an a. If we READ a b, we PUSH a b, and on and on until 
we encounter the X on the TAPE. 

After we take the first half of the word and stick it into the STACK, we have reversed 
the order of the letters and it looks exactly like the second half of the word. For example, if 
we begin with the input string 

abbXbba 

then at the moment we are just about to read the X, we have 



Is it not amazing how palindromes seem perfect for PUSHDOWN STACKs? 
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When we read the X, we do not put it into the STACK. It is used up m the process of 
transferring us to phase two. This is where we compare what is left on the TAPE with what 
is in the STACK. In order to reach ACCEPT, these two should be the same letter for letter, 

down to the blanks. 



If we read an a, we had better pop an a (pop anything else and we REJECT), if we read a 
we had better pop a b (anything else and we REJECT), and if we read a blank, wehadbet- 
ter pop a blank; when we do, we accept. If we ever read a second X, we also go to REJECT 
The machine we have drawn is deterministic. The input alphabet here is 2, \a b 

X l, so each READ state has four edges coming out of it. 

The STACK alphabet has two letters T = {a b }, so each POP has three e ges coming 
out of it. At each READ and each POP, there is only one direction the input can take. Each 

string on the TAPE generates a unique path through this PDA. 

We can draw a less complicated picture for this PDA without the REJECT states if we 
do not mind having an input string crash when it has no path to follow. 

The whole PDA (without REJECTS) is pictured below: 



EXAMPLE 

Let us now consider what kind of PDA could accept the language ODDPALINDROME. 
This is the language of all strings of u’s and b’ s that are palindr,andhmt an odd^num¬ 
ber of letters. The words in this language are just like the words in PALINDROMEX excep 
that the middle letter X has been changed into an a or a b. 

ODDPALINDROME = [a b aaa aba bab bbb . . .) 
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The problem here is that the middle letter does not stand out, so it is harder to recognize 
where the first half ends and the second half begins. In fact, it is not only harder; it is impos¬ 
sible. A PDA, just like an FA, reads the input string sequentially from left to right and has no 
idea at any stage how many letters remain to be read. In PALINDROMEX, we knew that X 
marked the spot; now we have lost our treasure map. If we accidentally push into the 
STACK even one letter too many, the STACK will be larger than what is left on the TAPE 
and the front and back will not match. The algorithm we used to accept PALINDROMEX 
cannot be used without modification to accept ODDPALINDROME. We are not completely 
lost, though. The algorithm can be altered to fit our needs by introducing one nondeterminis- 
tic jump. That we choose this approach does not mean that there is not a completely different 
method that might work deterministically, but the introduction of nondeterminism here 
seems quite naturally suited to our purpose. 

Consider 



This machine is the same as the previous machine except that we have changed the X 
into the choice: a or b. 

The machine is now nondeterministic because the left READ state has two choices for 
exit edges labeled a and two choices for b. 

If we branch at the right time (exactly at the middle letter) along the former X-edge, we 
can accept all words in ODDPALINDROME. If we do not choose the right edge at the right 
time, the input string will be rejected even if it is in ODDPALINDROME. Let us recall, 
however, that for a word to be accepted by a nondeterministic machine (NFA, TG, or PDA), 
all that is necessary is that some choice of edges does lead to ACCEPT. 

For every word in ODDPALINDROME, if we make the right choices, the path does 
lead to acceptance. 

The word aba can be accepted by this machine if it follows the dotted path: 
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It will be rejected if it tries to push two, three, or no letters into the STACK before tak¬ 
ing the right-hand branch to the second READ state. 

We present a better method of tracking the action of a word on a PDA in the next ex¬ 
ample. % 

Let us now consider a slightly different language. 


EXAMPLE 

Recall the language 

EVENPALINDROME = {5 reverse^), where 5 is in (a + b)*} 

= (A aa bb aaaa abba baab bbbb aaaaaa 

This is the language of all palindromes with an even number of letters. 

One machine to accept this language is pictured below: 





We have labeled the READ states 1 and 2 and the POP states 1, 2, and 3 so that we car 
identify them in discussion. These numbers do not indicate that we are to READ or 
more than one letter. They are only labels. Soda-POP, grand-POP, and POP-com would do a* 
well. The names will help us trace the path of an input string through the machine. 

This machine is nondeterministic. At READ, when we read an a from the TAPE, w< 
have the option of following an a-edge to PUSH a or an a-edge to POP,. If we read a b it 
READ,, we also have two alternatives: to go to PUSH b or to go to POP r If we read a A it 

READ,, we have only one choice: to go to POP 3 . 

Let us take notice of what we have done here. In the PDA for PALINDROME , ' 

X-edge took us into a second circuit, one that had the following form: read Iron 
TAPE compare with STACK read from TAPE — compare with STACK . ; h 

this machine, we begin the process of “read from TAPE compare with STACK » 
READ,. The first letter of the second half of the word is read in READ,, then we ,m ” ie J 
ately go to the POP that compares the character read with what is on top of the STACK- ^ 


m 

w, 

SI 


ter this, we cycle READ 2 
simultaneously. 


READ, 


until both run out of letter 
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It will be easier to understand this machine once we see it in action. Let us run the string 
babbab. Initially, we have 


TAPE 

b 

J_£_ 

b 

b 

a 

b 


STACK 

A 


We can trace the path by which this input can be accepted by the successive rows in the 
table below: 


STATE 

STACK 

TAPE 

START 

A • • ■ 

babbabb ■ • ■ 

READ, 

A • • • 

ftahbabb • • • 

PUSH b 

bb • • ■ 

ftabhabb • ■ ‘ 

READ, 

bb • • • 

$&bbabb ' • * 

PUSH a 

abb • ■ ■ 

Ubbabb ■ • • 

READ, 

abb • • • 

$4$ babb • • • 

PUSH b 

babb ■ • • 

$ <f $ babb • - ■ 

READ, 

babb ■ • • 

UHabb • • • 


If we are going to accept this input string, this is where we must make the jump out of 
the left circuit into the right circuit. The trace continues: 


POP 2 

abb • * * 

$U$abb • • • 

read 2 

abb ■ - • 

UU<hbb ■ • - 

POP, 

bb ■ ■ • 

UUfab • • • 

read 2 

bb * • • 

■ ■ ■ 

POP 2 

A • • • 

HdUMb • • • 

read 2 

A • • • 

umu ■ ■ ■ 


(We have just read the first of the infinitely many blanks on the TAPE.) 


POP 3 

A ■ • • 

(Popping a blank from an 
empty stack still leaves 
blanks.) 

UUM$b • • • 

(Reading a blank from an empty 
tape still leaves blanks.) 

ACCEPT 

A • • • 

MUU ■ ■ '• 
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Notice that to facilitate the drawing of this table, we have rotated the STACK so that itl 
reads left to right instead of top to bottom. 

Because this is a nondeterministic machine, there are other paths this input could have 
taken. However, none of them leads to acceptance. 

Below we trace an unsuccessful path: 


STATE 


START 


(We had no choice but to 
go here.) 

PUSH 6 

(We could have chosen to go 
to POP 2 instead.) 

READ, 

(We had no choice but to go 
here from PUSH b.) 


STACK 


(We know there are infinitely 
many blanks underneath 
this b .) 


_ babbab _ 

babbab 

Ijabbab 

(Notice that the TAPE remains 
unchanged except by READ 
statements.) 

Ubbab 


# 

* 

I 

isi 


; 

{Ji ; ' 


t 


(Here, we exercised bad 
judgment and made a poor 
choice; PUSH a would have 
been better.) 

CRASH 

(This means that when we 
were in POP, and found a b 
on top of the STACK, we 
tried to take the 6-edge out 
of POP,. However, there is 
no 6-edge out of POP,.) 


(When we pop the 6, what is 
left is all A’s.) 


Ubbab 


p,. ....... 

te- 

life - 

if- 

(gjg 


Another unsuccessful approach to accepting the input babbab is to loop around the cir¬ 
cuit READ, —► PUSH six times until the whole string has been pushed onto the STACK. Af¬ 
ter this, a A will be read from the TAPE and we have to go to POP 3 . This POP will ask if the 
STACK is empty. It will not be, so the path will CRASH right here. 

The word A is accepted by this machine through the sequence 


START READ, 


ACCEPT 


As above, we shall not put all the ellipses (. . .) into the tables representing traces. Wl| 
understand that the TAPE has infinitely many blanks on it without having to write 

UHbabA • • • 

We shall see later why it is necessary to define PDAs as nondeterministic machines. 

In constructing our new machines, we had to make several architectural decisions. 
Should we include a memory device?—yes. Should it be a stack, queue, or random ac* „ 
cess?—a stack. One stack or more?—one. Deterministic?—no. Finitely many states?— 
yes. Can we write on the INPUT TAPE?—no. Can we reread the input?—no. Remember 
that we are not trying to discover the structure of a naturally occurring creature; we are Con- 


Defining the PDA 


307 


coders trying to invent a CFL-recognizing machine. The test of whether our decisions are 
correct will come in the next chapter. 

DEFINING THE PDA 

We can now give the full definition of PDAs. 

DEFINITION 

A pushdown automaton, PDA, is a collection of eight things: 

1. An alphabet X of input letters. 

2. An input TAPE (infinite in one direction). Initially, the string of input letters is placed on 

the TAPE starting in cell i. The rest of the TAPE is blank. 

3. An alphabet T of STACK characters. 

4. A pushdown STACK (infinite in one direction). Initially, the STACK is empty (contains 
all blanks). 

5. One START state that has only out-edges, no in-edges: 



6. Halt states of two kinds: some ACCEPT and some REJECT. They have in-edges and no 
out-edges: 



7. Finitely many nonbranching PUSH states that introduce characters onto the top of the 
STACK. They are of the form 



where X is any letter in T. 

8. Finitely many branching states of two kinds: 

(i) States that read the next unused letter from the TAPE 



which may have out-edges labeled within letters from X and the blank character A, 
with no restrictions on duplication of labels and no insistance that there be a label 
for each letter of 2, or A. 
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(ii) States that read the top character of the STACK 



which may have out-edges labeled with the letters of V and the blank character A,': 
again with no restrictions. | 

We further require that the states be connected so as to become a connected directed; 
graph. 

To run a string of input letters on a PDA means to begin from the START state and fol¬ 
low the unlabeled edges and those labeled edges that apply (making choices of edges when? 
necessary) to produce a path through the graph. This path will end either at a halt state or 
will crash in a branching state when there is no edge corresponding to the letter/characteri 
read/popped. When letters are read from the TAPE or characters are popped from the| 
STACK, they are used up and vanish. | 

An input string with a path that ends in ACCEPT is said to be accepted. An input string* 
that can follow a selection of paths is said to be accepted if at least one of these paths leadsj 
to ACCEPT. The set of all input strings accepted by a PDA is called the language accepted! 
by the PDA, or the language recognized by the PDA. it 

We should make a careful note of the fact that we have allowed more than one exit edge- 
from the START state. Because the edges are unlabeled, this branching has to be nondeter- 
ministic. We could have restricted the START state to only one exit edge. This edge could 
immediately lead into a PUSH state in which we would add some arbitrary symbol to th^ 
STACK, say, a Weasel. The PUSH Weasel would then lead into a POP state having several 
edges coming out of it all labeled Weasel. POP goes the Weasel, and we make our nondeter+ 
ministic branching. Instead of this charade, we allow the START state itself to have several 
out-edges. 

Even though these are nondeterministic like TGs, unlike TGs we do not allow edges to 
be labeled with words, only with single characters. Nor do we allow A-edges. Edges labeled; 
with A are completely different. 

We have not specified, as some authors do, that the STACK has to be empty at the 
time of accepting a word. Some go so far as to define acceptance by the STACK condi¬ 
tion, as opposed to halt states. We shall address this point with a theorem later in this 
chapter. 


EXAMPLE 

Consider the language generated by the CFG 

S—*S + S|S*S|4 

The terminals are +, *, and 4 and the only nonterminal is S. 
The following PDA accepts this language: 
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This is a funny-looking PDA with one POP, four READs, and seven PUSHs. 

Instead of proving that this machine accepts exactly the language generated by this 
CFG, we only trace the acceptance of the string 

4 + 4*4 

This machine offers plenty of opportunity for making nondeterministic choices, almost 
all of them disastrous. The path we illustrate is one to acceptance. 


STATE 

STACK 

TAPE 

START 

A 

4 + 4*4 

PUSH, S 

S 

4 + 4*4 

POP 

A 

4 + 4*4 

push 2 s 

S 

4 + 4*4 

push 3 + 

+s 

4 + 4*4 

push 4 s 

s + s 

4 + 4*4 

POP 

+s 

4 + 4*4 

READ, 

+s 

+ 4*4 

POP 

s 

+ 4*4 

read 2 

s 

4*4 

POP 

A 

4*4 

push 5 s 

s 

4*4 
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STATE 

STACK 

TAPE 

push 6 * 

*S 

4*4 

push 7 s 

s*s 

4*4 

POP 

*s 

4*4 

READ, 

*s 

* 4 • 

POP 

s 

*4 ~' t 

read 3 

s 

4 

n 

POP 

A 

4 

READ, 

A 

A 

in 

POP 

A 

A 

read 4 

A 

) A 


ACCEPT 

A 

A 


Note that this time we have erased the TAPE letters read instead of striking them. 

THEOREM 28 

For every regular language L, there is some PDA that accepts it. 


PROOF 

We have actually discussed this matter already, but we could not formally prove anything | 
until we had settled on the definition of a PDA. 

Because L is regular, it is accepted by some FA. The constructive algorithm for convert- J| 
ing an FA into an equivalent PDA was presented at the beginning of this chapter. ■ 

One important difference between a PDA and an FA is the length of the path formed by » 
a given input. If a string of seven letters is fed into an FA, it follows a path exactly seven q| 
edges long. In a PDA, the path could be longer or shorter. The PDA below accepts the regu- J 
lar language of all words beginning with an a . But no matter how long the input string, the , 
path is only one or two edges long. 


I 





Because we can continue to process the blanks on the TAPE even after all input letters 
have been read, we can have arbitrarily long or even infinite paths caused by very short input ^ 
words. For example, the following PDA accepts only the word b , but it must follow a seven- ■ *j 
edge path to acceptance: 



The following machine accepts all words that start with an a in a path of two edges and 
loops forever on any input starting with a b. (We can consider this an infinite path if we so 
desire.) 



We shall be more curious about the consequences of infinite paths later. 
The following result will be helpful to us in the next chapter. 


THEOREM 29 

Given any PDA, there is another PDA that accepts exactly the same language with the addi¬ 
tional property that whenever a path leads to ACCEPT, the STACK and the TAPE contain 
only blanks. 


PROOF 

We present a constructive algorithm that will convert any PDA into a PDA with the property 
mentioned. 

Whenever we have the machine part 


ACCEPT 


we replace it with the following diagram: 
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Technically speaking, we should have labeled the top loop “any letter in 2” and the 
tom loop “any character in T.” rmmt 

The new PDA formed accepts exactly the same language and finishes all successful g \ 
with empty TAPE and empty STACK. 

4 PROBLEMS 

In Problems 1 and 2, convert the following FAs into equivalent PDAs. 



2 . 
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For Problems 3 and 4, consider the deterministic PDA: 



3. Using a trace table like those in this chapter, show what happens to the INPUT 
TAPE and STACK as each of the following words proceeds through the machine: 

(i) abb 

(ii) abab 

(iii) aabb 

(iv) aabbbb 

4. (i) What is the language accepted by this PDA? 

(ii) Find a CFG that generates this language. 

(iii) Is this language regular? 
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5. Consider the following PDA: 




Trace the following words on this PDA: 

(i) aaabbb 

(ii) aaabab 

(iii) aaabaa 

(iv) aaaabb 


6. (i) Prove that the language accepted by the machine in Problem 5 is 

L = { a n S , where S starts with b and length^) — n } 


(ii) Find a CFG that defines the language in part (i). 

(iii) Prove that the language of the machine in Problem 5 is not regular. 
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Consider the following PDA: 



7. (i) This PDA is deterministic so it should be no problem to trace the inputs aababb 

and abbbaaab on it. Show that they lead to ACCEPT. 

(ii) Explain how this machine accepts the language {a n b m a m b'\ where n and m are inde¬ 
pendent integers, 2, 1}. 

8. (i) Show that the language a n b m a m b n is context-free. 

(ii) Show that this language is nonregular. 

For Problems 9 through 11, consider the following nondeterministic PDA: 



In this machine, REJECT occurs when a string crashes. Notice here that the 
STACK alphabet is T = {.*}. 

9. (i) Show that the string ab can be accepted by this machine by taking the branch from 
READ, to POP, at the correct time. 

(ii) Show that the string bbba can also be accepted by giving the trace that shows when 
to take the branch. 
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10. Show that this PDA accepts the language of all words with an even number of letter 
(excluding A). Remember, it is also necessary to show that all words with odd lengt 
can never lead to ACCEPT. 

11. Here we have a nondeterministic PDA for a language that could have been accepted b 
an FA. Find such an FA. Find a CFG that generates this language. 

For Problems 12 and 13, consider the following nondeterministic PDA: 



Here, the STACK alphabet is again 


T= {x) 


12. (i) Show that the word aa can be accepted by this PDA by demonstrating a trace of i| 

path to ACCEPT. 

(ii) Show that the word babaaa can be accepted by this PDA by demonstrating a trac 
of its path indicating exactly where we must take the branch from READ, t 
READ r 

(iii) Show that the string babaaab cannot be accepted. 

(iv) Show that the string babaaaa cannot be accepted. 

13. Show that the language of this machine is 

TR AILIN GCOU NT = {sa ,eng,h(5) } 

= {any string s followed by as many a’s as s has letters} 

We know that this language is not regular from Chapter 10, Problem 4, that there is 
CFG that generates it from Chapter 12, Problem 13. 

14. Build a deterministic PDA to accept the language {a n b n + 1 }. (As always, when unspe 
fied, the condition on n is assumed to be n = 1,2,3, . . . .) 

15. Let the input alphabet be 2 = [a b c] and L be the language of all words in whi 
all the a’s come before the b’ s and there are the same number of a’s as b' s and arbitra 


iii 


SB 

r' 
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many c’s that can be in front, behind, or among the a’s and b' s. Some words in L are 
abc , caabcb, ccacaabcccbccbc . 

(i) Write out all the words in this language with six or fewer letters. 

(ii) Show that the language L is not regular. 

(iii) Find a PDA (deterministic) that accepts L. 

(iv) Find a CFG that generates L. 

16. Find a PDA (nondeterministic) that accepts all PALINDROME where the alphabet is 
%= [a b) by combining the EVENPALINDROME part with the ODDPALINDROME 
PDA. This is not the same machine for PALINDROME as produced in the next chapter— 
so do not cheat. 

17. We have seen that an FA with N states can be converted into an equivalent PDA with N 
READ states (and no POP states). Show that for any FA with N states there is some 
PDA with only one READ state (and several POP states), but that uses N different 
STACK symbols and accepts the same language. 

18. Let L be some regular language in which all the words happen to have an even length. 
Let us define the new language Twist(L) to be the set of all the words of L twisted, 
where by twisted we mean the first and second letters have been interchanged, the third 
and fourth letters have been interchanged, and so on. For example, if 

L = [ba abba babb . . .} 

Twist(L) = {ab baab abbb . . .} 

Build a PDA that accepts Twist(L) 

19. Given any language L that does not include A, let us define its cousin language | L | as 
follows: For any string of a” s and b' s, if the word formed by concatenating the second, 
fourth, sixth, . . . letters of this string is a word in L, then the whole string is a word in 
\L\. For instance, if bbb is a word in L, then ababbbb and bbababa are both words in | L |. 

(i) Show that if there is some PDA that accepts L, then there is some PDA that accepts | L |. 

(ii) If L is regular, is | L | necessarily regular too? 

20. Let L be the language of all words that have the same number of a’s and b 's and that, as 
we read them from left to right, never have more b’s than a’s. For example, 

abaaabbabb 

is good but 

abaabbba 

is no good because at a certain point we had four b' s but only three a’s. 

In Chapter 10, Problem 19, we proved that this language is nonregular when we 
called it PARENTHESES. 

All the words in L with six letters are 

aaabbb aababb aabbab 

abaabb ababab 

(i) Write out all the words in L with eight letters (there are 14). 

(ii) Find a PDA that accepts L. 

(iii) Prove that L is not regular. 

(iv) Find a CFG that defines L. 
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BUILDING A PDA FOR EVERY CFG 

We are now ready to prove that the set of all languages accepted by PDAs is the same as t 
set of all languages generated by CFGs. 

We prove this in two steps. 


THEOREM 30 

Given a CFG that generates the language L, there is a PDA that accepts exactly L. 


THEOREM 31 

Given a PDA that accepts the language L, there exists a CFG that generates exactly L. 

These two important theorems were both discovered independently by Schiitzenberg 
Chomsky, and Evey. 


PROOF OF THEOREM 30 

The proof will be by constructive algorithm. From Theorem 26 in Chapter 13 (p. 278), 
can assume that the CFG is in CNF. (The problem of A will be handled later.) 

Before we describe the algorithm that associates a PDA with a given CFG in its m 
general form, we shall illustrate it on one particular example. Let us consider the followi 
CFG in CNF: 

S-^SB 

S-+AB 

A-+CC 

B—*b 

C~+a 


We now propose the following nondeterministic PDA: 
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In this machine, the STACK alphabet is 

T = {S A B C) 

whereas the TAPE alphabet is only 

X= {a b} 

We begin by pushing the symbol S onto the top of the STACK. We then enter the busiest 
state of this PDA, the central POP. In this state, we read the top character of the STACK. 

The STACK will always contain nonterminals exclusively. Two things are possible when 
we pop the top of the STACK. Either we replace the removed nonterminal with two other non¬ 
terminals, thereby simulating a production (these are the edges pointing downward), or else we 
do not replace the nonterminal at all but instead we go to a READ state, which insists we read a 
specific terminal from the TAPE or else it crashes (these edges point upward). To get to AC¬ 
CEPT, we must have encountered READ states that wanted to read exactly those letters that 
were originally on the INPUT TAPE in their exact order. We now show that to do this means we 
have simulated a leftmost derivation of the input string in this CFG. 

Let us consider a specific example. The word aab can be generated by leftmost deriva¬ 
tion in this grammar as follows: 

Working-String Generation Production Used 

S=MB 
=*CCB 


S-+AB Step 1 
A-+CC Step 2 
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>aCB 

>aaB 

>aab 


C~*a 

C-^a 

B^>b 


Step 3 
Step 4 
Step 5 


In CNF, all working strings in leftmost derivations have the form 

(string of terminals) (string of Nonterminals) g 

To run this word on this PDA, we must follow the same sequence of productions, keeping \ 
the STACK contents at all times the same as the string of nonterminals in the working string 
of the derivation. 

We begin at START with 


STACK 


TAPE 


aab 


Immediately, we push the symbol S onto the STACK: 


STACK 

TAPE 

S 

aab 


We then head into the central POP. The first production we must simulate is S - 
the S and then we PUSH B, PUSH A , arriving at this: 


*AB. We pop! 


STACK 


TAPE 


AB 


aab 


Note that the contents of the STACK are the same as the string of nonterminals in the work¬ 
ing string of the derivation after step 1. | 

We again feed back into the central POP. The production we must now simulate is art. 
A —* CC. This is done by popping the A and following the path PUSH C, PUSH C. 

The situation is now 


STACK 


TAPE 


CCB 


aab 


Notice that here again, the contents of the STACK are the same as the string of nonter¬ 
minals in the working string of the derivation after step 2. 

Again, we feed back into the central POP This time we must simulate the production 
C —► a. We do this by popping the C and then reading the a from the TAPE. This leaves 


STACK 


TAPE 


CB 


<kab 


We do not keep any terminals in the STACK, only the nonterminal part of the working 
string. Again, the STACK contains the string of nonterminals in step 3 of the derivation. : 
However, the terminal that would have appeared in front of these in the working string has 
been cancelled from the front of the TAPE. Instead of keeping the terminals in the STACK, 
we erase them from the INPUT TAPE to ensure a perfect match. 

The next production we must simulate is another C—Again, we POP C and READ 
a. This leaves 


STACK _I_ TAPE 

B .'. ' I <H<jb 

Here again, we can see that the contents of the STACK are the string of nonterminals in 
the working string in step 4 of the derivation. The whole working string is aaB; the terminal 
part aa corresponds to what has been struck from the TAPE. 

This time when we enter the central POP, we simulate the last production in the deriva¬ 
tion, We pop the B and read the b. This leaves 


STACK 

TAPE 

A 

m 


This A represents the fact that there are no nonterminals left in the working string after 
step 5. This, of course, means that the generation of the word is complete. 

We now reenter the POP, and we must make sure that both STACK and TAPE are 
empty: 

POP A READ 3 -> ACCEPT 

The general principle is clear. To accept a word, we must follow its leftmost derivation 
from the CFG. If, in some CFG, the word is 

ababbbaab 

and at some point in its leftmost Chomsky derivation, we have the working string 

ababbZWV 

then at this point in the corresponding PDA-processing the status of the STACK and TAPE 
should be 


STACK _I_ TAPE 

ZWV I JuUbaab 

the used-up part of the TAPE being the string of terminals and the contents of the STACK 
being the string of nonterminals of the working string. This process continues until we have 
derived the entire word. We then have 


STACK 

TAPE 

A 

muuu 


At this point, we POP A, go to READ 3 , and ACCEPT. 

There is noticeable nondeterminism in this machine at the POP state. This parallels, re¬ 
flects, and simulates the nondeterminism present in the process of generating a word. In a 
leftmost derivation, if we are to replace the nonterminal N, we have one possibility for each 
production that has N as the left side. Similarly, in this PDA we have one path leaving POP 
for each of these possible productions. Just as the one set of productions must generate any 
word in the language, the one machine must have a path to accept any legal word once it sits 
on the INPUT TAPE. The point is that the choices of which lines to take out of the central 
POP tell us how to generate the word through leftmost derivation, because each branch rep¬ 
resents a production. 
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It should also be clear that any input string that reaches ACCEPT has gotten there by 5 
having each of its letters read by simulating Chomsky productions of the form 

Nonterminal —*■ terminal j 

This means that we have necessarily formed a complete leftmost derivation of this word 
through CFG productions with no nonterminals left over in the STACK. Therefore, every 
word accepted by this PDA is in the language of the CFG. 

One more example may be helpful. Consider the randomly chosen CFG (in CNF) be¬ 
low: -J 


We propose the following PDA 


READ 


ACCEPT 


START 


READ, 


PUSHS 


We shall trace simultaneously how the word haaab can be generated by this CFG 
how it can be accepted by this PDA. 


TAPE 


STACK 


STATE 


LEFTMOST DERIVATION 


baaab 


START 


baaab 


PUSHS 


baaab 


baaab 


PUSH B 


baaab 


PUSH A 


baaab 


baaab 


PUSH B 


baaab 


PUSH B 


PUSH B 


PUSH B 


PUSH B 


- 




PUSH A 


PUSH B 


PUSH A 


POP 


BB 


baaab 
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LEFTMOST DERIVATION 

STATE 

STACK 

TAPE 

=s >bBB 

READ 3 

BB 

$aaab 


POP 

B 

$aaab 


PUSH B 

BB 

$aaab 

=>bABB 

PUSH A 

ABB 

lj>aaab 


POP 

BB 

$aaab 

=>baBB 

READ, 

BB 

$/aab 


POP 

B 

Uaab 

=*>baaB 

read 2 

B 

$Uab 


POP 

A 

U4ab 


PUSH B 

B 

UUb 


PUSH A 

AB 

$4<fab 


POP 

B 

$4<jah 


READ, 

B 



POP 

A 

H>Mb 

=>baaab 

READ., 

A 

um 


POP 

A 

uw 


read 4 

A 

um 


ACCEPT 

A 

um 


At every stage, we have the following equivalence: 

Working string 

= (letters cancelled from TAPE) (string of Nonterminals from STACK) 

At the beginning, this means 

Working string = S 

Letters cancelled = none 

String of Nonterminals in STACK = S 

At the end, this means 

Working string = the whole word 
Letters cancelled = all 
STACK = A 

Now that we understand this example, we can give the rules for the general case. 

ALGORITHM 

If we are given a CFG in CNF as follows: 

y —* y y 


3^4 
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^*2*2 


X 3 -+a 


X 4 ~*a 


X 5 -*b 


where the start symbol 5 = X x and the other nonterminals are X 2 , X y 
lowing machine. 

Begin with 


, we build the foU 


•J 

1 

; - 



For each production of the form 


X t = XjX k 


we include this circuit from the POP back to itself: 



For all productions of the form 


X,-*b 


we include this circuit: 



tk 


’ . 


! 

i 


A 

'M 








: 














When the stack is finally empty, which means we have converted our last nonterminal to ? 


a terminal and the terminals have matched the INPUT TAPE, we follow this path: 

POP >— READ ;>— ACCEPT J 

mm 



From the reasons and examples given above, we know that all words generated by the 



READ 


READ. 


READ 


ACCEPT 


READ, 
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CFG will be accepted by this machine and all words accepted will have leftmost derivations 
in the CFG. 

This does not quite finish the proof. We began by assuming that the CFG was in CNF, 
but there are some context-free languages that cannot be put into CNF. They are the lan¬ 
guages that include the word A. In this case, we can convert all productions into one of the 
two forms acceptable to CNF, while the word A must still be included. 

To include this word, we need to add another circuit to the PDA, a simple loop at 
the POP: 


This kills the nonterminal S without replacing it with anything and the next time we enter the 
POP, we get a blank and proceed to accept the word. ■ 


EXAMPLE 


The language PALINDROME (including A) can be generated by the following CFG in CNF 
(plus one A-production): 

S -+AR l S —*a 

R^SA S -*b 

S — *BR 2 A-*a 

R 2 ->SB B—*b 

S -+AA S A 

S —>BB 

The PDA that the algorithm in the proof of Theorem 30 instructs us to build is 


PUSHii, PUSH A I PUSH R? I PUSH B PUSH A PUSH B 


PUSH A PUSH S PUSH B PUSH S PUSH A PUSH B 


PUSH S 
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Let us examine how the input string abaaba is accepted by this PDA. 
LEFTMOST DERIVATION STATE I TAPE 


S=*AR, 


> abAABA 


>abaABA 


>abaaBA 


> abaabA 


>abaaba 


STATE 

TAPE 

START 

abaaba 

PUSHS 

abaaba 

POP 

abaaba 

PUSH/?, 

abaaba 

PUSH A 

abaaba 

POP 

abaaba 

read 3 

Abaaba 

POP 

Abaaba 

PUSH A 

Abaaba 

PUSHS 

Abaaba 

POP 

Abaaba 

PUSH R 2 

Abaaba 

PUSH/? 

Abaaba 

POP 

Abaaba 

READ, 

Abaaba 

POP 

Abaaba 

PUSH/? 

Abaaba 

PUSHS 

Abaaba 

POP 

Abaaba 

PUSH A 

Abaaba 

PUSH A 

Abaaba 

POP 

Abaaba 

read 3 

AbAaba 

POP 

AbAaba 

READj 

ABAAba 

POP 

AbAAba 

read 2 

AbAAba 

POP 

AbAAba 

read 3 

AbAAbA,A 

POP 

AbAAUA 

read 4 

AbAAbAA 

ACCEPT 

AbAAbAA 


STACK 

A 

S 

A 

R, 
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Notice how different this is from the PDAs we developed in Chapter 14 for the languages 
EVENPALINDROME and ODDPALINDROME. m 


BUILDING A CFG FOR EVERY PDA 

Now we have to prove the other half of the equivalence theorem, that every language ac¬ 
cepted by a PDA is context-free. 

PROOF OF THEOREM 31 

This is a long proof by constructive algorithm. In fact, it is unquestionably the most tortur¬ 
ous proof in this book; parental consent is required. We shall illustrate each step with a par¬ 
ticular example. It is important, though, to realize that the algorithm we describe operates 
successfully on all PDAs and we are not merely proving this theorem for one example alone. 

The requirements for a proof are that it convinces and explains. The following argu¬ 
ments should do both if we are sufficiently perseverant. 

Before we can convert a PDA into a CFG, we have to convert it into a standard form 
which we call conversion form. To achieve this conversion form, it is necessary for us to in¬ 
troduce a new “marker state’’ called a HERE state. We can put the word HERE into a box 
shaped like a READ state in the middle of any edge and we say that we are passing through 
that state any time we travel on the edge that it marks. Like the READ and POP states, the 
HERE states can be numbered with subscripts. 

One use of a HERE state is so that 



Notice that a HERE state does not read the TAPE nor pop the STACK. It just allows us 
to describe being on the edge as being in a state. A HERE state is a legal fiction—a state 
with no status, but we do permit branching to occur at such points. Because the edges lead¬ 
ing out of HERE states have no labels, this branching is necessarily nondeterministic. 

DEFINITION (inside the proof of Theorem 31) 

A PDA is in conversion form if it meets all the following conditions: 

1. There is only one ACCEPT state. 

2. There are no REJECT states. 

3. Every READ or HERE is followed immediately by a POP; that is, every edge leading 
out of any READ or HERE state goes directly into a POP state. 
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4. No two POPs exist in a row on the same path without a READ or HERE between them 1 

whether or not there are any intervening PUSH states. (POPs must be separated by 
READs or HEREs.) : | 

5. All branching, deterministic, or nondeterministic, occurs at READ or HERE states, 
none at POP states, and every edge has only one label (no multiple labels). 

6. Even before we get to START, a “bottom of STACK” symbol, $, is placed on thi 
STACK. If this symbol is ever popped in the processing, it must be replaced immedi¬ 
ately. The STACK is never popped beneath this symbol. Right before entering ACCEPT^ 
this symbol is popped out and left out. 

7. The PDA must begin with the sequence 



8. The entire input string must be read before the machine can accept the word. 


It is now our job to show that all the PDAs as we defined them before can be made over S 
into conversion form without affecting the languages they accept. 

Condition 1 is easy to accommodate. If we have a PDA with several ACCEPT states, let 
us simply erase all but one of them and have all the edges that formerly went into the others 
feed into the one remaining. 

Condition 2 is also easy. Because we are dealing with nondetermini stic machines, if we-1 
are at a state with no edge labeled with the character we have just read or popped, we simply j 
crash. For an input string to be accepted, there must be a safe path to ACCEPT; the absence 
of such a path is tantamount to REJECT. Therefore, we can erase all REJECT states and the 
edges leading to them without affecting the language accepted by the PDA. 

Now let us consider condition 3. A READ in a certain PDA might not have a POP ii|jjj 
mediately following it; we might find something like this: 



What we do is insert a POP and immediately put back on the STACK whatever might have 
been removed by this additional POP. 

We need to have a PUSH for every letter of T every time we do this: 
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We must also modify the funny extra POP x-PUSH * situations that we introduced for 
condition 3. Instead of using 


PUSH a 


PUSH b 


which entailed branching at the POP state, we must use the equivalent: 


PUSH h 


Instead of a deterministic branch at a POP state, we have made a nondeterministic 
branch at a READ or HERE state. 

Condition 6 is another easy one. We simply presume that the STACK initially looks like 


STACK 


When we change a PDA into conversion form, we must also remember that instead of pop¬ 
ping a A from an empty STACK, we shall find the symbol $. If we wanted (for some reason) 
to POP several A’s off of an empty STACK, we shall have to be satisfied with several POP 
$-PUSH $ combinations. They work just as well. 

If we ever have a PDA that wants to accept an input string without emptying the 
whole STACK (including $), we could just insert some states that empty the STACK 
harmlessly right before the ACCEPT, exactly as we did in the proof of Theorem 29 (p. 
311). 

Condition 7 makes no new demands if the STACK already satisfies condition 6. Condi¬ 
tion 8 can be satisfied by the algorithm of Theorem 29 from Chapter 14. 

Now let us take a whole PDA and change it into conversion form. The PDA we use is 
one that accepts the language 

[a^b"} = {aab aaaabb aaaaaabbb . . .} 
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The PDA is 



Every a from the beginning of the INPUT TAPE is pushed onto the STACK. Then for 
every b that follows, two a’s are popped. Acceptance comes if both TAPE and STACK empty at 

the same time. The words accepted must therefore be of the form a 2n b n for n= 1,2,3. 

Here, we have already deleted the REJECT state and useless READ and POP alternative 
edges. To make this PDA satisfy all the conditions for conversion form, we must remake 
it into 



To begin with, we must start with the sequence demanded by condition 7. This makes us 
insert a new POP state called POP 4 . Now in the original machine, we began a circuit 
READ,-PUSH a-READj-PUSH a ... . Because of condition 3, every READ must be 
followed by a POP so the pair READ,-PUSH a must become READ, ~POP 5 —PUSH 
a-PUSH a. The first PUSH is to return the a that was popped out. The second PUSH adds 
the a to the STACK. The first time through this loop, the top of the STACK does not contain 
an a yet and what is popped is the $, which must immediately be returned to the STACK. 
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This is the purpose of the nondeterministic branch POP 6 -PUSH $-PUSH a. This branch^ 
also adds an a to the STACK. This branch will be taken the first time out of READj, but if 
ever again, it will cause a CRASH and lead to the acceptance of no new words. 

The next violation of conversion form in the original picture was that POP t was imme 
diately followed by POP 2 without a READ in between. This is fixed by inserting a HERE 
(There is only one HERE state in this whole machine, so there is no reason to number it.) 

The last change is that instead of POP 3 finding a blank, it should find the stack-end sym¬ 
bol $. 3 

The new form of this PDA obviously accepts exactly the same language as before, [a 2 *# 1 ] 
Now that we have put this PDA into conversion form, we can explain why we eve 
wanted to impose these eight conditions on a poor helpless machine. Any PDA in conversion 
form can be considered as a collection of primitive parts, or path segments, each of the fol¬ 
lowing form: 


FROM 

TO 

READ 

POP 

PUSH 

START 
or READ 
or HERE 

READ 

or HERE 
or ACCEPT 

One or no 

input 

letters 

Exactly 
one STACK 

character 

Any string 
onto the 
STACK 


The states START, READ, HERE, and ACCEPT are called the joints of the machine. Be¬ 
tween two consecutive joints on a path, exactly one character is popped and any arbitrary 
number can be pushed. Because no edge has a multiple label, between any two joints the 
machine can read no letters at all from the INPUT TAPE or else exactly one specified letter. 
This was the purpose of imposing all the conversion conditions. 

The PDA above can be drawn as a set of joints with “arcs” (path segments) between 
them much like a TG: 


Once a PDA is in conversion form, we can describe the entire machine as a list of all the 
primitive joint-to-joint path segments (the “arcs” mentioned above). Such a list is called a sum- j 
mary table. A summary table for a PDA satisfies the same purpose as a transition table for an 
FA. It explains the total action on the inputs without recourse to pictorial representation. This 
may seem like a step backward, because the pictures make more sense than the tables—which 
is why we do not commonly use tables for FAs. However, for the purpose of completing the 
proof of Theorem 31 (which is what we are still in the midst of doing), the summary table will 
be very useful. Ji 
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The PDA we have just converted corresponds to the following summary table: 


FROM 

Where 

TO 

Where 

READ 

What 

POP 

What 

PUSH 

What 

ROW 

Number 

START 

READ, 

A 

$ 

$ 

1 

READ, 

READ, 

a 

$ 

< 3 $ 

2 

READ, 

READ, 

a 

a 

aa 

3 

READ, 

HERE 

b 

a 

— 

4 

HERE 

read 2 

mm 

a 

— 

5 

read 2 

HERE 

b 

a 

— 

6 

read 2 

ACCEPT 

n 

$ 

— 

7 


In the last column we have assigned a number to each row for our future purposes. Each 
path segment corresponds to one row of the table. 

Notice that in Row 2 we summarized 



because it means add the $ first, then the a. 

In our definition of conversion form, we made sure that all branching occurs at the joints 
READ and HERE. This means that no branching can occur in the middle of any row of the 
summary table. 

Every word that can be accepted by the PDA corresponds to some path from START to 
ACCEPT. We can view these paths as made up not of the components “edges” but of the 
components “rows of summary table.” A path is then broken into a sequence of these path 
segments. 

For example, in the PDA above the word aaaabb can be accepted by the machine 
through the path 

START-POP 4 ~PUSH $-READ,-POP 6 -PUSH $ - PUS H a - READ, - POP, - PU SH 
a-PUSH a-READ,-POP,-PUSH a-PUSH a-READ, -POP 5 -PUSH a-PUSH 
a-READ 1 -POP 1 -HERE-POP 2 -READ 2 -POP 1 -HERE-POP 2 -READ 2 -POP 3 -ACCEPT 

This is a nondeterministic machine, and there are other paths that this input could take, 
but they all crash somewhere; only this path leads to acceptance. Instead of this long list of 
states, we could describe the path of this word through the machine as a sequence of rows 
from the summary table. The path above can be described as 

Row,-Row 2 - Row, - Row, - Row 3 - Row, - Row, - Row 6 - Row, - Row, 

Let us repeat that acceptance by a PDA is determined by the existence of a path from 
START to ACCEPT. In FAs, paths correspond in a natural fashion to strings of letters. In a 
PDA paths correspond in a natural way to strings of rows from the summary table. 

The approach that we have taken for PDAs is to define them originally by a pictorial 
representation and imagine a correspondence between input strings and paths* through the 
machine-graph. To abstract the grammar (CFG) of the language that the PDA accepts, we 
have had to begin by changing our PDAs first into conversion form and then into summary 
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tables. This is to make an algebraic nonpictorial representation of our PDAs that we can;j 
then convert into a grammar. Most authors define PDAs originally as summary tables of 
some kind and the pictorial representations as directed graphs are rarely given. The proof 
of Theorem 31 in such a treatment is much shorter, because the proof can begin at the 
point we have just reached. Something is lost, though, in not seeing a PDA as a picture. 
This is best illustrated by comparing the preceding summary table with the first pictorialj 
representation of the PDA. It is much easier to understand the looping and the language 
from the picture. | 

As definitions, both the pictures and the tables describe the same type of language-' 
accepting device. The question of which is superior cannot be answered without knowing the 
specific application. Our application is education and the most understandable formulation is 

the best. || 

Notice that the HERE state reads nothing from the TAPE, so we have put A in the 
“READ What” column. We could put a dash or a 4> there just as well. A blank (A) would be 
wrong, because it means something else; to say that we read a A means the TAPE must be 
empty. A A on the other hand means, by convention, that we do not read the TAPE. 

The order in which we put the rows in the summary table does not matter as long as 
every path segment of the PDA between two consecutive joints is represented as some row. 

The summary table carries in it all the information that is found in the pictorial represen¬ 
tation of the PDA. Every path through the PDA is a sequence of rows of the summary table. 
However, not every sequence of rows from the summary table represents a viable path. Right 
now it is very important for us to determine which sequences of rows do correspond to possi¬ 
ble paths through the PDA, because the paths are directly related to the language accepted. 

Some sequences of rows are impossible; for example, we cannot immediately follow 
Row 4 with Row 6 because Row 4 leaves us in HERE, while Row 6 begins in READ 2 . We must j8 
always be careful that the end joints connect up logically. 

This requirement is necessary but not sufficient to guarantee that a sequence of rows can 
be a path. Row, leaves us in READ, and Row 3 starts in READ,, yet Row,-Row 3 cannot be 
the beginning of a path. This is because Row, pushes a $, whereas Row 3 , which pops an a, 
obviously presumes that the top of the STACK is an a. We must have some information 
about the STACK before we can string together rows. 

Even if we arranged the rows so that the pushes and pops match up, we still might get 
into trouble. A path formed by a sequence of rows with four Row 3 ’s and six Row 5 ’s is im¬ 
possible. This is true for a subtle reason. Six Row 5 ’s will pop six a’s from the STACK; how¬ 
ever, because Row 2 can only be used once to obtain one a in the STACK and four Row 3 ’s 
can contribute only four more a's to the STACK, we are short one a. 

The question of which sequences of rows make up a path is very tricky. To represent a 
path, a sequence of rows must be joint-consistent (the rows meet up end to end) and 
STACK-consistent (when a row pops a character, it should be there, at the top of the 
STACK). 

Let us now define the row language of a particular PDA represented by a summary , 
table. It is the language whose alphabet letters are the names of the rows in the summary 
table: 


X = {Row, 

and has as legal words all those sequences of alphabet letters that correspond to paths from 
START to ACCEPT that might possibly be followed by some input strings, that is, all se- - 
quences from START to ACCEPT that are joint-consistent and STACK-consistent. 

Clearly, all valid words in this language begin with Row, and end with Row ? , but as we 
saw above, there are more requirements than just those. 
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Consider, for example, 

Row 5 Row 5 Row 3 Row 6 

This is a string of length 4, but this string is not a word in the row language for three rea¬ 
sons: (1) It does not represent a path that begins with START or ends with ACCEPT; (2) it is 
not joint-consistent; (3) it is not STACK-consistent. 

Not only are we going to look for rules to tell us which strings of rows are words, 
but we shall produce a CFG for the row language. From this CFG, we can produce 
another CFG, a grammar for the language of strings of a’s and b' s accepted by the origi¬ 
nal PDA. 

Let us pause here to outline the global strategy of this proof: 

1. We start with any PDA drawn as defined in Chapter 14. 

2. We redraw the PDA to meet the requirements of conversion form. 

3. From the machine in conversion form, we build a summary table and number the rows. 

4. Every word accepted by the PDA corresponds to at least one path from START to 
ACCEPT and, as we shall soon see, every STACK-consistent path from START to 
ACCEPT corresponds to some word. Therefore, we define the row language to be the 
set of all sequences of rows that correspond to paths. 

5. We determine a CFG that generates all the words in the row language. 

6 . We convert this CFG for the row language into a CFG that generates all the words in the 
original language of a’s and b’s that are accepted by the PDA, thus proving Theorem 31. 

We are now up to step 5. 

We had to build half this house before we could take our first look at the blueprints. 

One thing we have to do is to keep track of the contents of the STACK. Since we are go¬ 
ing to want to produce a CFG that generates the row language, we need to introduce nonter¬ 
minals that contain the information we need to ensure joint- and STACK-consistency. We 
have to know about the beginning and end positions of the path segments to which certain 
row strings correspond and about the contents of the STACK. It is not necessary to maintain 
any information about what characters are read from the TAPE. If what is on the TAPE is 
what the rows want to read, then the input string will be accepted. Once we know what the 
rows are, we can find an input word that gives them what they want to read. We shall see the 
implications of this observation later, but every joint- and STACK-consistent path actually is 
the path through the PDA taken by some input string. 

The nonterminals in the row language grammar have the following form: 

Net(X, Y, Z) 

where the X and Y can be any joint; START, READ, HERE, or ACCEPT, and Z is any char¬ 
acter from the stack alphabet T. This whole expression is one nonterminal even though it is 
at least 10 printer’s symbols long. These odd nonterminals stand for the following: 

There is some path going from joint X to joint Y, perhaps passing through some 
other joints (READ or HERE states), which has the net effect on the STACK of 
removing the symbol Z, where by “net effect” we mean that although there might 
be extra things put onto the STACK during the path, they are eventually removed 
and the STACK is never popped below the initial Z that is on the top of the STACK 
to begin with, and that is popped out somewhere along the way. 

We have never seen a nonterminal be such a complicated-looking item as Net(X, Y, Z), 
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PUSH a 


is not, because it presupposes knowledge about what is in the STACK under the top Z. 
there were a h under the Z initially, this sequence would fail (crash). We never presume 
knowledge of what is available in the STACK in the statement Net(X, Y, Z) beyond knowing 
that Z is on top. 

For a given PDA, some sets of all the possible sentences Net(X, Y, Z) are true and some 
are false. Our job, given a PDA, is to determine which Net statements are true and how they 
fit together. To do this, we must first examine every row of the table to see which ones havepgg 
the net effect of popping exactly one letter. There are other paths that are composed of sev¬ 
eral rows that can also be described by a single Net statement, but we shall discover these by 

a separate procedure later. 

Let us recall the summary table that we have developed for the PDA for the language 2| 
{a 2n h n }. Row 4 of this table says essentially 

Net(READ,, HERE, a) jg 

which means, “We can go from READ, to HERE at the total cost of popping an a from theg 

top of the stack.” /P 

In other words, Row 4 is a single Net row. However, let us suppose that we have a row i 
the summary table for some arbitrary PDA that looks like this: 


by itself is also an acceptable sequence for a STACK operation governed by a nonterminal 
Net (X,Y,Z). 4, 

However, 
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but we have had nonterminals before with meanings that could be expressed in a sentence 

(as in the CFG for EQUAL). IF 

This complicated description of the “net effect” on the STACK means, for instance, that 
the sequence of the STACK operations: 


has the net effect of popping one Z because it represents these stack states. 


m m 

? 




The net STACK effect is the same as the simple POP Z, and no character was presumed^ 
to be in the STACK below the top Z. The symbol “?” here represents the unknown and unex¬ 
amined part of the STACK. The picture 
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FROM 

TO 

READ 

POP 

PUSH 

ROW 

read 9 

READ 3 

b 

b 

abb 

11 


As it stands, Row,, is not a Net-style sentence because the trip from READ 9 to READ 3 
does not subtract one letter from the STACK; the net effect is rather that it adds two. How¬ 
ever, there is a particular way that Row,, can interact with some other Net-style sentences. 
For instance, if we knew that the following three nonterminals could be realized as path seg¬ 
ments for this machine 

Net(READ 3 , READ 7 , a) Net(READ ? , READ,, b) Net(READ,, READ 8 , b) 
then, using Row,,, we could conclude that the nonterminal 

Net(READ g , READ g , b) 

could also be realized as a path segment. This is because we can go first from READ g to READ 3 
using Row,,, which eats the b at the top of the STACK but leaves the letters abb in its place, 
with the net effect of adding ab. The first a takes us from READ 3 to READ ? by the path implied 
by Net(READ 3 , READ 7 , a). The next b takes us from READ 7 along some path to READ,, as 
guaranteed by Net(READ 7 , READ,, b). Then the last b takes us from READ, to READ 8 by 
some path guaranteed by the last Net. The total cost of the trip has been the top b. Thanks to the 
abb we added, during this whole trip we have never popped the STACK beneath the top b. 

Let us write this as 

Net(READ 9 , READ g , b) 

—* Row,,Net(READ 3 , READ ? , a)Net(READ ? , READ,, b) Net(READ,, READ g , b) 

In other words, the sentence that says that we can go from READ 9 to READ 8 at the cost of 
b can be replaced by the concatenation of the sentences Row,,, Net . . . Net . . . Net . . . . 

This will be a production in our row language. We begin with the nonterminal Net(READ 9 , 
READ 8 , b), and we produce a string that has one terminal, Row,,, and some nonterminals, Net 
. . . Net . . . Net .... Notice that Row,, takes us from READ 9 to READ 3 , the first Net 
from READ 3 to READ 7 , the second from READ ? to READ,, and the last from READ, to 
READg, giving us the trip promised on the left side of the production at the appropriate cost. 

This hypothetical Row,, that we are presuming exists for some PDA could also be used in 
other productions—for example, 

Net(READ 9 , READ, 0 , b) 

—* Row, ,Net(READ 3 , READ 2 , «)Net(READ 2 , READ 2 , fr)Net(READ 2 , READ, 0 , b ) 

assuming, of course, that these additional Net’s are available, by which we mean realizable 
by actual paths. 

The general formulation for creating productions from rows of the summary table is as 
follows: 

If the summary table includes the row 


FROM 

TO 

READ 

POP 

PUSH 

ROW 

READ, 

READ y 

u 

w 

m,w 2 , . . • , m n 

i 


then for any sequence of joint states, 5,, S 2 , . . . , S n , we include the row language CFG 
production 
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Net(READ x , w)—>Row.Net(READ y ,5 lf m,) . . . Net (S n _ v S n ,m n ) 

This is a great number of productions and a large dose of generality all at once. Let us 
illustrate the point on an outrageous, ludicrous example. & {■* 

Suppose that someone offered us a ride from Philadelphia to L.A. if we would trade him 
our old socks for his sunglasses and false teeth. We would say “terrific” because we could gT 
then go from Philadelphia to Denver for the price of the old socks. How? First, we get a ride i§! 
to L.A. by trading the socks to him for the sunglasses and false teeth. Then, we find someone S 
who will drive us from L.A. to Chicago for a pair of sunglasses and another nice guy wfi ilB 
will drive us from Chicago to Denver for a pair of false teeth. 

t eeth _.r.hir^n Philadelphia 


FROM 

TO 

READ 

POP 

PUSH 

Phil. 

L.A. 

anything 

socks 

sunglasses, false teeth 


Net(Phil., Denver, socks) —» Row 77 Net(L.A., Chi., shades)Net(Chi., Denver, teeth) 

The fact that we have written this production does not mean that it can ever be part § 
of the derivation of an actual word in the row language. The idea might look good on paper, 
but where do we find the clown who will drive us from Chicago to Denver for the used 
choppers? 

So too with the other productions formed by this general rule. p 

We can replace Net(this and that) with Net(such and such), but can we ever boil it all 
down to a string of rows? We have seen in working with CFGs in general that replacing one ^ 
nonterminal with a string of others does not always lead to a word in the language. 

In the example of the PDA for which we built the summary table, Row 3 says that we can 
go from READ, back to READ, and replace an a with aa. This allows the formation of 
many productions of the form 

Net(READ,, X, a ) — Row 3 Net(READ,, Y, a)Net(Y, X, a) 

where X and Y could be READ,, READ 2 , or READ 3 —or even HERE. Also, X could be 
ACCEPT, as in this possibility: 

Net(READ,, ACCEPT, a) —>Row 3 Net(READ,, READ 2 , a)Net(READ 2 , ACCEPT, a) 

There are three rules for creating productions in what we shall prove is a CFG for the ^ 
row language of a PDA presented to us in a summary table. jJ_ 

Rule 1 We have the nonterminal 5, which starts the whole show, and the production 

S Net(START, ACCEPT, $) 

which means that we can consider any total path through the machine as a trip 
from START to ACCEPT at the cost of popping one symbol, $, and never re¬ 
ferring to the STACK below $. 

This rule is the same for all PDAs. 

Rule 2 For every row of the summary table that has no PUSH entry, such as 
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TO_ READ 

Y anything 

we include the production 

Net(X, Y,Z)—> Row, 

This means that Net(X, Y, Z), which stands for the hypothetical trip from X 
to Y at the net cost Z, is really possible by using Row, alone. It is actualizable 
in this PDA. 

Let us remember that because this is the row language we are generating, 
this production is in the form 

Nonterminal —> terminal 

In general, we have no guarantee that there are any such rows that push 
nothing, but if no row decreases the size of the STACK, it can never become 
empty and the machine will never accept any words. 

For completeness we restate the expansion rule above. 

Rule 3 For each row in the summary table that has some PUSH, we introduce a whole 
family of productions. For every row that pushes n characters onto the STACK, 
such as 

POP 

z 

for all sets of n READ, HERE, or ACCEPT states S v ... , we create the pro¬ 
ductions 

Net(X, S n , Z)—>RoW;Net(T, S p m,) . . . Net(S w _,, S n , m n ) 

Remember the fact that we are creating productions does not mean that they 
are all useful in the generation of words. We merely want to guarantee that we 
get all the useful productions, and the useless ones will not hurt us. 

No other productions are necessary. 

We shall prove in a moment that these are all the productions in the CFG defining the 
row language. That is, the language of all sequences of rows representing every word ac¬ 
cepted by the machine can be generated by these productions from the start symbol S. 

Many productions come from these rules. As we have observed, not all of them are used 
in the derivation of words because some of these Net variables can never be realized as ac¬ 
tual paths, just as we could include the nonterminal Net(NY, L.A., 5#) in the optimistic hope 
that some airline will run a great sale. Only those nonterminals that can be replaced eventu¬ 
ally by strings of solid terminals will ever be used in producing words in the row language. 
This is like the case with this CFG: 

S^X | Y 

X^aXX 

Y^ab 

The production X —* aXX is totally useless in producing words. 



FROM 



anything 


POP _ PUSH ROW 

Z — i 


FROM 

X 
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Prod 17 Net(READ,, HERE, $) 

—* Row 2 Net(READj, READ 2 , a)Net(READ 2 , HERE, $) 

Prod 18 Net(READ ls HERE, $) 

—* Row 2 Net(READj, HERE, <a)Net(HERE, HERE, $) 

Prod 19 Net(READ,, ACCEPT, $) 

—* Row 2 Net(READj, READ,, a)Net(READ,, ACCEPT, $) 

Prod 20 Net(READ,, ACCEPT, $) 

Row 2 Net(READ,, READ 2 , u)Net(READ,, ACCEPT, $) 

Prod 21 Net(READ,, ACCEPT, $) 

-> Row 2 Net(READ,, HERE, a)Net(HERE, ACCEPT, $) 

When Rule 3 is applied to Row 3 , it generates productions of the form 

Net(READ,,X, a) —* Row 3 Net(READ,, Y, a)Net(K,X, a) 

where X can be READ,, READ 2 , HERE, or ACCEPT and Y can only be READ,, READ 2 , or 
HERE. 

This gives 12 new productions: 

Prod 22 Net(READ,, READ,, a) 

—* Row 3 Net(READ,, READ,, a)Net(READ,, READ,, a) 

Prod 23 Net(READ,, READ,, a) 

—>Row 3 Net(READ,, READ 2 , <z)Net(READ 2 , READ,, a) 

Prod 24 Net(READ,, READ v d) 

—*Row 3 Net(READ,, HERE, a)Net(HERE, READ,, a) 

Prod 25 Net(READ,, READ 2 , a) 

—♦ Row 3 Net(READ,, READ,, <a)Net(READ,, READ 2 , a) 

Prod 26 Net(READ,, READ 2 , a) 

—>Row 3 Net(READ,, READ 2 , a)Net(READ 2 , READ 2 , a) 

Prod 27 Net(READ,, READ 2 ,a) 

—* Row 3 Net(READ,, HERE, u)Net(HERE, READ 2 , a) 

Prod 28 Net(READ,, HERE, a) 

—> Row 3 Net(READ,, READ,, <z)Net(READ,, HERE, a) 

Prod 29 Net(READ,, HERE, a) 

—* Row 3 Net(READ,, READ,, c/)Net(READ 2 , HERE, a) 

Prod 30 Net(READ,, HERE, a) 

-> Row 3 Net(READ,, HERE, a)Net(HERE, HERE, a) 

Prod 31 Net(READ,, ACCEPT, a) 

Row 3 Net(READ,, READ,, a)Net(READ,, ACCEPT, a) 

Prod 32 Net(READ,, ACCEPT, a) 

—> Row 3 Net(READ,, READ 2 , o)Net(READ 2 , ACCEPT, a) 

Prod 33 Net(READ,, ACCEPT, a) 

—* Row 3 Net(READ,, HERE, a)Net(HERE, ACCEPT, a) 

This is the largest CFG we have ever tried to handle. We have 

7 terminals: Row,, Row 2 , . . . , Row, 

29 nonterminals: S, 16 of the form Net(,, $) 

12 of the form Net(,, a) 

33 productions: Prod 1, . . . , Prod 33 

We know that not all these will occur in an actual derivation starting at 5. For example, 
Net(READ 2 , ACCEPT, a) cannot happen, because to go from READ 2 to ACCEPT, we must 
pop a $, not an a. 
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We shall now prove that this CFG with all the Net’s is exactly the CFG for the row lan- I 
guage. To do that, we need to show two things: First, every string generated by the CFG is a 
string of rows representing an actual path through the PDA from START to ACCEPT and, -■ 1 
second, all the paths corresponding to accepted input strings are equivalent to row words ™ 
generated by this CFG. gjgg 

Before we consider this problem in the abstract, let us return to the concrete illustration 'vf| 
of the summary table for the PDA that accepts 

[a 2n b n \ 

We shall make a complete list of all the productions that can be formed from the rows of m 
the summary table using the three preceding rules. 

Rule 1, always, gives us only the production 

Prod 1 S Net(START, ACCEPT, $) 7 jlj 

Rule 2 applies to rows 4, 5, 6, and 7, creating the productions : 

Prod 2 Net(READ,, HERE, a)-*Row 4 
Prod 3 Net(HERE, READ 2 , a) —»Row 5 
Prod 4 Net(READ 2 , HERE, a) —* Row 6 
Prod 5 Net(READ 2 , ACCEPT, $) —> Row 7 

Finally, Rule 3 applies to rows 1, 2, and 3. When applied to Row,, it generates 

Net(START, X , $) -* Row,Net(READ,, X, $) 

where X can take on the different values READ,, READ 2 , HERE, or ACCEPT. This gives us : /fj 
these four new productions: 

Prod 6 Net(START, READ,, $) Row,Net(READ,, READ,, $) 

Prod 7 Net(START, READ 2 , $) -»Row,Net(READ,, READ 2 , $) 

Prod 8 Net(START, HERE, $) Row,Net(READ,, HERE, $) 

Prod 9 Net(START, ACCEPT, $)—♦Row,Net(READ,, ACCEPT, $) 

; 

When applied to Row 2 , Rule 3 generates 

Net(READ,, X, $) Row 2 Net(READ,, Y, a)Net(T, X, $) 

where X can be any joint state but START, and Y can be any joint state but START or 
ACCEPT (because we cannot return to START or leave ACCEPT). 

The new productions derived from Row 2 are of the form above with all possible values |f 
for X and Y: 

Prod 10 Net(READ,, READ,, $) il 

Row 2 Net(READ,, READ,, a)Net(READ,, READ,, $) 

Prod 11 Net(READ,, READ,, $) 

-> Row 2 Net(READ,, READ 2 , a)Net(READ 2 , READ,, $) 

Prod 12 Net(READ,, READ,, $) 

->Row 2 Net(READ,, HERE, a)Net(HERE, READ,, $) 

Prod 13 Net(READ,, READ 2 , $) 

Row 2 Net(READ,, READ,, ^)Net(READ,, READ 2 , $) 

Prod 14 Net(READ,, READ 2 , $) ^ 

—> Row 2 Net(READ,, READ,, a)Net(READ 2 , READ,, $) 

Prod 15 Net(READ,, READ,, $) ’3 

Row,Net(READ,, HERE, a)Net(HERE, READ,, $) M 

Prod 16 Net(READ,, HERE, $) 

-*Row,Net(READ,, READ,, a)Net(READ,, HERE, $) 
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To see which productions can lead toward words, let us begin to draw the leftmost total 
language tree of the row language. By “leftmost” we mean that from every working-string 
node we make one branch for each production that applies to the leftmost nonterminal^ 
Branching only on the leftmost nonterminal avoids considerable duplication without losin| 
any words of the language, because all words that can be derived have leftmost derivations 
(Theorem 27 on p. 284). il 

In this case, the tree starts simply as 


Net(START, ACCEPT, $) (If 

l h 

Row,Net(READ p ACCEPT, $) (1, 9)4 

This is because the only production that has S as its left-hand side is Prod 1. The only pro¬ 
duction that applies after that is Prod 9. The numbers in parentheses at the right show which 
sequence of productions was used to arrive at each node in the tree. The leftmost (and onlyf, 
nonterminal now is Net(READ p ACCEPT, $). There are exactly three productions that can 
apply here: Prod 19, Prod 20, and Prod 21. So, the tree now branches as follows: 

Row,Net(READ p ACCEPT, $) 



Row 1 Row 2 Net(READ 1 , READ p a)Net(READ p ACCEPT, $) 
Row 1 Row 2 Net(READ p READ 2 , a)Net(READ 2 , ACCEPT, $) 
Row,Row 2 Net(READ p HERE, #)Net(HERE, ACCEPT, $) 

(1,9) 


(1.9.19) 

(1.9.20) 

(1.9.21) 5 


(1,9,19) 


(1,9, 20) 


"(1,9,21) 


Let us consider the branch (1, 9, 19). Here, the leftmost nonterminal is Net(READ|j 
READ p a). The productions that apply to this nonterminal are Prod 22, Prod 23, and Prod 
24. Application of Prod 23 gives us an expression that includes Net(READ 2 , READ,, a), but. 
there is no production for which this Net is the left-hand side. (This corresponds to the fact v 
that there are no paths from READ 2 to READ, in this PDA.) Therefore, Prod 23 can never 
be used in the formation of a word in this row language. 

This is also true of Prod 24, which creates the expression Net(HERE, READ,, a). No ? 
matter how many times we apply Prod 22, we still have a factor of Net(READ p READ,, a). 
There is no way to remove this nonterminal from a working string. Therefore, any branch in¬ 
corporating this nonterminal can never lead to a string of only terminals. The situation is ^ 
similar to this CFG: 

S —b | X 

X-+aX 3 

We can never get rid of the X. So, we get no words from starting with S *X. Therefore, we 
might as well drop this nonterminal from consideration. 

We could produce just as many words in the row language if we dropped Prod 22, Prod 
23, and Prod 24. Therefore, we might as well eliminate Prod 19, because this created the 
situation that led to these productions, and it can give us no possible lines, only hopeless 
ones. We now see that we might as well drop the whole branch (1,9, 19). 

Now let us examine the branch (1, 9, 20). The leftmost nonterminal here is Net(READ,, 
READ,, a). The productions that apply to this nonterminal are Prod 25, Prod 26, and 
Prod 27. 

Of these. Prod 25 generates a string that involves Net(READ,, READ,, a), which wC 
saw before led to the death of the branch (1,9, 19). So, Prod 25 is also poison. 
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We have no reason at the moment not to apply Prod 26 or Prod 27. The tree, therefore, 
continues: 



0,9, 20) 

Row j Row 2 Row 3 Net(READ,, READ 2 , a)Net(READ 2 , READ 2 , a)Net(READ 2 , ACCEPT, $) 
Row 1 Row 2 Row 3 Net(READ 1 , HERE, <a)Net(HERE, READ.,, a)Net(READ 2 , ACCEPT, $) 


( 1 . 9 . 20 . 26 ) 

( 1 . 9 . 20 . 27 ) 



Let us continue the process along one branch of the tree: 


(1,9, 20, 27) 

RoWjRow^oWjRow^eKHERE, READ,, a)Net(READ 2 , ACCEPT, $) (1,9,20,27,2) 

1 

Row 1 Row 2 Row 3 Row 4 Row 5 Net(HERE, ACCEPT, $) (1,9, 20, 27, 2, 3) 

i 

Row 1 Row 2 Row 3 Row 4 Row 5 Row 7 (1,9, 20, 27, 2, 3, 5) 


This is the shortest word in the entire row language. The total language tree is infinite. 

In this particular case, the proof that this is the CFG for the row language is easy, and it 
reflects the ideas in the general proof that the CFG formed by the three rules we stated is the 
desired CFG. 

For one thing, it is clear that every derivation from these rules is a sequence of rows of 
the summary table that is joint- and STACK-consistent and therefore represents a real path 
through the PDA. 

Now we have to explain why every path through the PDA is derivable from the set of 
productions that these rules create. 

Every word accepted by the PDA is accepted through some path. Every particular path 
is associated with a specific sequence of STACK fluctuations (like a stock value going up 
and down). Every fluctuation is a Net nonterminal. It is either directly the equivalent of a 
Row terminal (if it represents a simple segment in the path), or it can be broken down into a 
sequence of smaller STACK fluctuations. There are rules of production that parallel this de¬ 
composition which break the Net nonterminal into a sequence of the other corresponding 
Net nonterminals. These smaller fluctuations, in turn, can continue to be resolved until we 
hit only nondecomposable Row terminals, and this sequence of terminals is the path. There¬ 
fore, every path through the PDA can be generated from our grammar. 

Let us recapitulate the algorithm: 


1. Starting with any PDA as defined in the previous section, we can convert it into conver¬ 
sion form without changing its language. 

2. From conversion form, we can build a summary table that has all the information about 
the PDA broken into rows, each of which describes a simple path between joint states 
(READ, HERE, START, and ACCEPT). The rows are of the form 


FROM TO READ POP PUSH ROW 


3, There is a set of rules describing how to create a CFG for the language whose words are 
all the row sequences corresponding to all the paths through the PDA that can be taken 
by input strings on their way to acceptance. 

The rules create productions of three forms: 

Rule 1 S -> Net(START, ACCEPT, $) 
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Rule 2 Net(X, Y, Q) —> Row, 

Rule 3 Net(A, B y C) —>Row,Net(A, X, Y) . . . Net(2,£,W) 

What we need now to complete the proof of Theorem 31 is to create the CFG that gen 
erates the language accepted by the PDA—not just its row language which is the path lan 
guage, but the language of strings of a’s and b' s. 

We can finish this off in one simple step. In the summary table, every row had an entry 
that we have ignored until now, that is, the READ column. 

Every row reads a, b , A, or A from the INPUT TAPE. There is no ambiguity because arf 
edge from a READ state cannot have two labels. So, every row sequence corresponds to $ 
sequence of letters read from the INPUT TAPE. We can convert the row language into the 
language of the PDA by adding to the CFG for the row language the set of productions cre¬ 
ated by a new rule, Rule 4. 

Rule 4 For every row 


FROM 


create the production 


READ 


EFGH 


For example, in the summary table for the PDA that accepts that language [a 2n b n }, we; 
have seven rows. Therefore, we create the following seven new productions: 

Prod 34 Row,—»A 
Prod 35 Row 2 — 

Prod 36 Row 3 -+a 
Prod 37 Row 4 —> b 
Prod 38 Row 5 —> A 
Prod 39 Row 6 —► b 

Prod 40 Row 7 —* A 1 

The symbols, Row,, Row 2 , . . . that used to be terminals in the row language are now 
nonterminals. From every row sequence we can produce a word. For example. 

Row, Row 2 Row 3 Row 4 Row 5 Row 7 

becomes 

AaabAA 

Treating A like a A (to be painfully technical, by the production A —* A), we have the word" 


Clearly, this word can be accepted by this PDA by following the path 
Row, -Row 2 -Row 3 ~Row 4 -Row 5 -Row 7 

The derivations of the words from the productions of this CFG not only tell us which 
words are accepted by this PDA, but also indicate a path by which the words may be ac¬ 
cepted, which may be useful information. 

Remember that because this is a nondeterministic machine, there may be several paths 
that accept the same word. But for every legitimate word there will be at least one complete 
path to ACCEPT. 

The language generated by this CFG is exactly the language accepted by the PDA origin 
nally. Therefore, we may say that for any PDA there is a CFG that generates the same lan¬ 
guage the machine accepts. W 
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EXAMPLE 


We shall now illustrate the complete process of equivalence, as given by the two theorems in 
this chapter, on one simple example. We shall start with a CFG and convert it into a PDA 
(using the algorithm of Theorem 30), and we then convert this very PDA back into a CFG 
(using the algorithm of Theorem 31). 

The language of this illustration is the collection of all strings of an even number of a’ s: 

EVENA = (aa) + = a 2n = [aa aaaa aaaaaa . . .} 

One obvious grammar for this language is 

S—*SS | aa 

The leftmost total language tree begins: 



Before we can use the algorithm of Theorem 30 to build a PDA that accepts this language, 
we must put it into CNF. We therefore first employ the algorithm of Theorem 26, (p. 278). 

S-^SS | AA 
A—*a 


The PDA we produce by the algorithm of Theorem 30 is 
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We shall now use the algorithm of Theorem 31 to turn this machine back into a CFQ; 
First, we must put this PDA into conversion form: 


PUSH N 


PUSH A 


HERE 


PUSH $ 


READ- 


ACCEPT 


Notice that the branching that used to take place at the grand central POP must now take 
place at the grand central HERE. Notice also that because we insist there be a POP after every 
READ, we must have three POPs following READ,. Who among us is so brazen as to claim to 
be able to glance at this machine and identify the language it accepts? 

The next step is to put this PDA into a summary table: 


PUSH 


ROW 


READ 


FROM 

TO 

START 

HERE 

HERE 

HERE 

HERE 

HERE 

HERE 

READ, 

READ, 

HERE 

READ, 

HERE 


PUSH .S 


PUSH A 


r 

□ 


PUSH S 


PUSH A 
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FROM 

TO 

READ 

POP 

PUSH 

ROW 

READ, 

HERE 

a 

A 

A 

7 

HERE 

read 2 

A 

$ 

$ 

8 

read 2 

ACCEPT 

A 

$ 

___ 

9 


We are now ready to write out all the productions in the row language. We always begin 
with the production from Rule 1: 

5 — Net(START, ACCEPT, $) 

There are two rows with no PUSH parts, and they give us by Rule 2 

Net(HERE, READ,, A) —» Row 4 
Net(READ 2 , ACCEPT, $) -» Row 9 

From Row,, we get 12 productions of the form 

Net(START, X, $) Row,Net(HERE, Y, S)Net(Y, X, $) 

where X = HERE, READ,, READ 2 , or ACCEPT and Y = HERE, READ,, or READ r 
From Row 2 , we get eight productions of the form 

Net(HERE, X, S) -+ Row 2 Net(HERE, Y, S)Net(y, X, S ) 

where X = HERE, READ,, READ 2 , or ACCEPT and Y = HERE or READ,. 

From Row 3 , we get eight productions of the form 

Net(HERE, X, S) -> Row 3 Net(HERE, Y, A)Net(T, X, A) 

whereX = HERE, READ,, READ 2 , or ACCEPT and Y = HERE or READ,. 

From Row 5 , we get the four productions 

Net(READ,, X, S ) Row 5 Net(HERE, X, S ) 

whereX = HERE, READ,, READ 2 , or ACCEPT. 

From Row 6 , we get the four productions 

Net(READ,, X, $) -> Row 6 Net(HERE, X, $) 

whereX = HERE, READ,, READ 2 , or ACCEPT. 

From Row ? , we get the four productions 

Net(READ,, X, A) Row 7 Net(HERE, X, A) 

where X = HERE, READ,, READ 2 , or ACCEPT. 

From Row 8 , we get the one production 

Net(HERE, ACCEPT, $) -> Row g Net(READ 2 , ACCEPT, $) 

All together, this makes a grammar of 44 productions for the row language. 

To obtain the grammar for the actual language of the PDA, we must also include the fol¬ 
lowing productions: 

Row, —> A 
Row 2 —» A 
Row 3 —* A 
Row 4 —> A 
Row. —> a 










































































Row 6 -*fl 
Row 7 a 
Row 8 —> A 
Row y —> A. 

This is not exactly the two-production grammar for EVEN A we started with. We seem to 
have made a profit. * 

Before finishing our discussion of Theorem 31, we should say a word about condition 8 
in the definition of conversion form. On the surface, it seems that we never made use of this 
property of the PDA in our construction of the CFG. We did not. However, it is an important 
factor in showing that the CFG generates the language accepted by the machine. According 
to our definition of PDA, it is possible for a machine to accept an input string without read¬ 
ing the whole string. Because the final strings come from the row language, and represent 
paths to ACCEPT, only that part of the input string corresponding to a path to ACCEPT 
could be generated by the grammar. If a particular input is only accepted by paths that do not 
read all its letters, then the grammar resulting from the conversion algorithm would not gen¬ 
erate this word. 

4> PROBLEMS 

For each of the CFGs below in Problems 1 through 8, construct a PDA that accepts the same 
language they generate, using the algorithm of Theorem 30). 

1. (i) S~~*a$bh | abb 

(ii) S .-> SS \a\b 

2. S XaaX 
X~*aX | bX | A 

3. S~~*aS | aSbS | a 

4. S-+XY 

X ~*aX | bX j a 
Y-*Ya\Yb\a 

5 . S-*Xa | Yb 
X —* Sb | b 
Y ~~»Sa ] a 

6. (i) S~*Saa | aSa | aaS 

(ii) Flow many words of length 12 are there in this language? 

7* (i) S~™*(S)(S) | a 

Parentheses are terminals here. 

(ii) How many words are there in this language with exactly four a s? 

8, (i) S~~*XaY | YbX 

X-*YY\aY\b 

Y~*b | bb 

(ii) Draw the total language tree. 

9, Explain briefly why it is not actually necessary to convert a CFG into CNF to use the al 
gorithm of Theorem 30 to build a PDA that accepts the same language. 


Problems 

1°- Let us consider the set of all regular expressions to be a language over the alphabet 

X ~ {a b ( ) + * A} 

Let us call this language REGEX. 

(i) Prove that REGEX is nonregular if you don’t do this already on p. 286. 

(ii) Prove that REGEX is context-free by producing a grammar for it. 

(iii) Draw a PDA that accepts REGEX. 

(iv) Draw a deterministic PDA that accepts REGEX. 

11. (i) Draw a PDA in conversion form that has twice as many READ states as POP states. 

(ii) Draw a PDA in conversion form that has twice as many POP states as READ states. 

12. (i) In a summary table for a PDA, can there be more rows with PUSH than rows with 

no PUSH? 

(ii) In a summary table for a PDA, can there be more rows that PUSH more than one 
letter than there are rows that PUSH no letter? 

(iii) On a path through a PDA generated by a word in the language of the PDA, can there 
be more rows that PUSH more than one letter than rows that PUSH no letters? 

13. Consider this PDA: 



(i) What is the language of words it accepts? 

(ii) Put it into conversion form. 

(iii) Build a summary table for this PDA. 

14. (i) Write out the CFG for the row language of the PDA in Problem 13. 

(ii) Write out the CFG for the language accepted by this machine. 

15. Starting with the CFG for {a"b"} 

S aSb | ah 

(i) Put this CFG into CNF. 

(ii) Take this CNF and make a PDA that accepts this language. 

16. (i) Take the PDA of Problem 15 and put it into conversion form. (Feel free to eliminate 

useless paths and states.) 

(ii) Build a summary table for this PDA. 
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17. (i) From the summary table produced in Problem 15, write out the productions of the 

CFG that generate the row language of the PDA. 

(ii) Convert this to the CFG that generates the actual language of the PDA (not the row 
language). 

18. Prove that every context-free language over the alphabet X - {a b\ can be accepted 
by a PDA with three READ states. 

19. Prove that for any PDA there is another PDA that accepts exactly the same language but 
has only one POP state. 

20. Show that if the algorithm of Theorem 31 produces a deterministic PDA, then the lan¬ 
guage has only one word in it. 


CHAPTER 16 


Non-Context-Free 

Languages 


SELF-EMBEDDEDNESS 

We are now going to answer the most important question about context-free languages: Are 
all languages context-free? As any student who realizes that we are only in Part II of a three- 
part book knows, the answer is no. 

To prove this, we have to make a very careful study of the mechanics of word produc¬ 
tion from grammars. Let us consider a CFG that is in Chomsky Normal Form. All its pro¬ 
ductions are of the two forms 

Nonterminal —* Nonterminal Nonterminal 
Nonterminal —* terminal 


THEOREM 32 

Let G be a CFG in Chomsky Normal Form. Let us call the productions of the form 
Nonterminal —» Nonterminal Nonterminal 
live and the productions of the form 

Nonterminal —* terminal 

dead. 

If we are restricted to using the live productions at most once each, we can generate 
only finitely many words. 

PROOF 

The question we shall consider is: How many nonterminals are there in the working strings 
at different stages in the production of a word? 

Suppose we start (in some abstract CFG in CNF that we need not specify) with 


S=*AB 
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To prove this, we have to make a very careful study of the mechanics of word produc¬ 
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THEOREM 32 

Let G be a CFG in Chomsky Normal Form. Let us call the productions of the form 
Nonterminal —* Nonterminal Nonterminal 
live and the productions of the form 

Nonterminal —*■ terminal 

dead. 

If we are restricted to using the live productions at most once each, we can generate 
only finitely many words. 

PROOF 

The question we shall consider is: How many nonterminals are there in the working strings 
at different stages in the production of a word? 

Suppose we start (in some abstract CFG in CNF that we need not specify) with 


S=>AB 
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The right side, the working string, has exactly two nonterminals. If we apply the li 
duction 

A-+XY 

we get 

=>XYB 

which has three nonterminals. Now applying the dead production 

X—*b 

we get 

=*bYB 

with two nonterminals. But now applying the live production 

Y-+SX 

we get 

=> bSXB 

with three nonterminals again. k < 

Every time we apply a live production, we increase the number of nonterminal 1 
Every time we apply a dead production, we decrease the number of nonterminal r 
Because the net result of a derivation is to start with one nonterminal S and end upf^ 
(a word of solid terminals), the net effect is to lose a nonterminal. Therefore, in all' c l 
arrive at a string of only terminals, we must apply one more dead production than 
duction. This is true no matter in what order the productions are applied. lri ( ' 

For example (again these derivations are in some arbitrary, uninteresting CFGs : 


S=*b 


S=>XY 


S=>AB 



=*aY 





=> aa 


=>bXB 
=> bSXB 


or 


or 

=> baXB 
=>baaB 
=>baab 

Olive 


1 live 


3 live 

1 dead 


2 dead 


4 dead 


Let us suppose that the grammar G has exactly 

p live productions 


q dead productions e 

Because any derivation that does not reuse a live production can have at most p " 
tions, it must have at most (p +1) dead productions. Each letter in the final tirS* 1 
from the application of some dead production. Therefore, all words generated fron 
out repeating any live productions have at most (p + 1) letters in them. 

Therefore, we have shown that the words of the type described in this thectaP 1 
be more than (p + 1) letters long. Therefore, there can be at most finitely many of tl 

Notice that this proof applies to any derivation, not just leftmost derivations. ,;r 
When we start with a CFG in CNF, in all leftmost derivations, each intermedi; 
a working string of the form 
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=> (string of solid terminals) (string of solid Nonterminals) 

This is a special property of leftmost Chomsky working strings as we saw on p 284 Let 
us consider some arbitrary, unspecified CFG in CNF. — - 

Suppose that we employ some live production, say, 

Z-+XY 

twice in the derivation of some word w in this language. That means that at one point in the 
derivation, just before the duplicated production was used the first time, the leftmost Chom- 
sky working string had the form 

=^(^)Z(5 2 ) 

where s t is a string of terminals and j 2 is a string of nonterminals. At this point, the leftmost 
nonterminal is Z. We now replace this Z with XY according to the production and continue 
the derivation. Because we are going to apply this production again at some later point, the 
leftmost Chomsky working string will sometimes have the form 

where 5, is the same string of terminals unchanged from before (once the terminals have 
been derived in the front, they stay put; nothing can dislodge them), s 3 is a newly formed 
string of terminals, and s 4 is the string of nonterminals remaining (it is a suffix of s ) We are 
now about to apply the production Z—>XY for the second time. 

Where did this second 2 come from? Either the second Z is a tree descendant of the 
first Z, or else it comes from something in the old s r By the phrase “tree descendant,” we 
mean that in the derivation tree there is an ever-downward path from one Z to the other. 

Let us look at an example of each possibility. 


Case 1 

In the arbitrary grammar 

S-+AZ 

Z-+BB 

B-+ZA 

A-+a 

B—*b 

as we proceed with the derivation of some word, we find 


S=>AZ 
=> aZ 
=> aBB 
=> abB 




A 
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As we see from the derivation tree, the second Z was derived (descended) from the first 
We can see this from the diagram because there is a downward path from the first Z to the 
second. 

On the other hand, we could have something like in Case 2. 

Case 2 

In the arbitrary grammar 

A^BC 
C^BB 
A—* a 
B~*b 

as we proceed with the derivation of some word, we find 


Two times the leftmost nonterminal is B, but the second B is not 
first B in the tree. There is no downward tree path from the first B to the 
Because a grammar in CNF replaces every nonterminal with one 
derivation tree of each word has the property that every mode has one 
Such a tree is called a binary tree and should be very familiar to students 
When we consider the derivation tree, we no longer distinguish 
from any other sequence of nonterminal replacements. 

We shall now show that in an infinite language we can always 
Case 1. 

THEOREM 33 


descended from the 
second B. 
or two symbols, the 
or two descendants 
i of computer science 
leftmost derivations 

find an example of 


E 


If G is a CFG in CNF that has p live productions and q dead productions, and if w is a word 
generated by G that has more than 2 P letters in it, then somewhere in every derivation tree fo 
w there is an example of some nonterminal (call it Z) being used twice where the second Z is 
descended from the first Z. 

PROOF 


Why did we include the arithmetical condition that 

length(w) >2 P 1 
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This condition ensures that the production tree for w has more than p rows (genera¬ 
tions). This is because at each row in the derivation tree the number of symbols in the work¬ 
ing string can at most double the last row. 

For example, in some abstract CFG in CNF we may have a derivation tree that looks 
like this: 



A 

/ \ 


B 

/ \ 


C 

/ \ 


D 

/ \ 


X BA Y C CD A 


(In this figure, the nonterminals are chosen completely arbitrarily.) If the bottom row has 
more than 2 P letters, the tree must have more than p + 1 rows. 

Let us consider any terminal that was one of the letters formed on the bottom row of the 
derivation tree by a dead production, say, 

X^b 


The letter b is not necessarily the rightmost letter in w, but it is a letter formed after more 
than p generations of the tree. This means that it has more than p direct ancestors up the tree. 

From the letter b, we trace our way back up through the tree to the top, which is the start 
symbol S. In this backward trace, we encounter one nonterminal after another in the inverse 
order in which they occurred in the derivation. Each of these nonterminals represents a pro¬ 
duction. If there are more than p rows to retrace, then there have been more than p produc¬ 
tions in the ancestor path from b to S. 

But there are only p different live productions possible in the grammar G, so if more than p 
have been used in this ancestor path, then some live productions have been used more than once. 

The nonterminal on the left side of this repeated live production has the property that it 
occurs twice (or more) on the descent line from S to b. This then is a nonterminal that proves 
our theorem. 

Before stamping the end-of-proof box, let us draw an illustration, a totally arbitrary tree 
for a word win a grammar we have not even written out: 



B Y a 
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The word w is babaababa. Let us trace the ancestor path of the circled terminal a from 
the bottom row up: 

a came from Y by the production Y —■* a 
Y came from X by the production X—*BY 
X came from S by the production S —■* XY 
S came from B by the production B~*SX 
B came from X by the production X—*BY 
X came from S by the production S—>XY 

If the ancestor chain is long enough, one production must be used twice. In this exam¬ 
ple, both X—*BY and S—>XY are used twice. The two X’s that have boxes drawn around 
them satisfy the conditions of the theorem. One of them is descended from the other in the 
derivation tree of w. ■ 

DEFINITION 


prsx 


pl#S!Sf8::: 




In a given derivation of a word in a given CFG, a nonterminal is said to be self-embedded if 
it ever occurs as a tree descendant of itself. ■ 

Theorem 33 (p. 354) says that in any CFG all sufficiently long words have leftmost de¬ 
rivations that include a self-embedded nonterminal. 


hi 


EXAMPLE 




Consider the CFG for NONNULLPALINDROME in CNF: 


S—>AX 

X->SA 

S-+BY 

Y~*SB 

S-^a 


S—*b 
S—+AA 
S ~*BB 
A~*a 
B —*b 


There are six live productions, so according to Theorem 33, it would require a word of 
more than 2 6 =64 letters to guarantee that each derivation has a self-embedded nonterminal 
in it. 

If we are only looking for one example of a self-embedded nonterminal, we can find 
such a tree much more easily than that. Consider this derivation tree for the word aabaa: 


-Level 2 


-Level 3 


---i_ eve j 4 


-Level 5 


-Level 6 
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This tree has six levels, so it cannot quite guarantee a self-embedded nonterminal, but it 
has one anyway. Let us begin with the b on level 6 and trace its path back up to the top: 


“The b came from S which came from X, which came 
from 5, which came from X, which came from S.” 


In this way, we find that the production X —* SA was used twice in this tree segment: 




The tree above proceeds from S down to the first X. Then from the second X the tree 
proceeds to the final word. But once we have reached the second X, instead of proceeding 
with the generation of the word as we have it here, we could instead have repeated the same 
sequence of productions that the first X initiated, thereby arriving at a third X. The second 
can cause the third exactly as the first caused the second. From this third X, we could pro¬ 
ceed to a final string of all terminals in a manner exactly as the second X did. 

Let us review this logic more slowly. The first X can start a subtree that produces the sec¬ 
ond X, and the second X can start a subtree that produces all terminals, but it does not have to. 
Instead, the second X can begin a subtree exactly like the first X. This will then produce a 
third X. From this third X, we can produce a string of all terminals as the second X used to. 


Original tree with Modified tree with the whole X-subtree 

X-subtree indicated hanging from where the second X was 
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jr pl| 

V: 


This modified tree must be a completely acceptable derivation tree in the original lan¬ 
guage because each node is still replaced by one or two nodes according to the rules of pro¬ 
duction found in the first tree. 

The modified tree still has a last X and we can play our trick again. Instead of letting this 
X proceed to a subword as in the first tree, we can replace it by yet another copy of the origi¬ 
nal X-subtree. 










All these trees must be derivation trees of some words in the language in which the orig- - f 
inal tree started because they reflect only those productions already present in the original 
tree, just in a different arrangement. We can play this trick as many times as we want, but 4 
what words will we then produce? 

The original tree produced the word aabaa, but it is more important to note that from S 
we could produce the working string aX , and from this X we could produce the working 
string aXa. Then from the second X we eventually produced the subword ba. 

Let us introduce some new notation to facilitate our discussion. 


DEFINITION 


Let us introduce the notation => to stand for the phrase “can eventually produce.” It is used 


in the following context: Suppose in a certain CFG the working string S, can produce the.. g 
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working string S 2 , which in turn can produce the working string S 3 ... , which in turn can 
produce the working string S n : 

S,=>S 2 =>£,=> • • ■ =>S„ 

Then we can write 

■ 


Using this notation, we can write that in this CFG the following are true: 

S aX, X^> aXa, X^> ba 

It will be interesting for us to reiterate the middle step since if X A a Xa, then 
X => aaXaa and X => aaaXaaa and so on 

In general, 

X a n Xa n 

We can then produce words in this CFG starting with S => aX and finishing with X=>ba 
with these extra iterations in the middle: 

S =^> aX =^> aaXa aabaa 
S => aX => aaaXaa => aaabaaa 
S=>aX=> aaaaXaaa => aaaabaaaa 
S => aX => aa n Xa n aa n baa n 

Given any derivation tree in any CFG with a self-embedded nonterminal, we can use the iter¬ 
ative trick above to produce an infinite family of other words in the language. ■ 


EXAMPLE 

For the arbitrary CFG, 


S—*AB 
A—*BC 
C—+AB 
A—>a 
B —* b 


One possible derivation tree is 


/X\ 

/'X\w 

/ / S /\ \ \ 

C_L^yji V \ a; 

/ C \ ^6 

/ V', 


t-r 

a b 


In this case, we find the self-embedded nonterminal A in the dashed triangle. Not only is A 
self-embedded, but it has already been used twice the same way (two identical dashed triangles). 
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Again, we have the option of repeating the sequence of productions in the triangle as 
many times as we want: 


✓ / A \ \ 

^ - ^-\-- b - 

/ \ 


✓ 


j' 

1- 

IB 

iv 

£ 


A 


Each iteration will produce new and longer words, all of which must belong to the origi¬ 
nal language. 

This is why in the last theorem it was important that the repeated nonterminals be along 
the same line of descent. 


THE PUMPING LEMMA FOR CFLs 

This entire situation is analogous to the multiply reiterative pumping lemma of Chapter 1 
so it should be no surprise that this technique was discovered by the same people: Bar-Hill 
Perles, and Shamir. The following theorem, called “the pumping lemma for context-free 1 
guages,” states the consequences of reiterating a sequence of productions from a self-emb 
ded nonterminal. 


THEOREM 34 

If G is any CFG in CNF with p live productions and w is any word generated by G wi 
length greater than 2 P , then we can break up w into five substrings: 

w = uvxyz 

such that x is not A and v and y are not both A and such that all the words 


uvxyz 

uvvxyyz 

uvvvxyyyz 

uvvvvxyyyyz 


mv xy z 


can also be generated by G. 


PROOF 

From our previous theorem, we know that if the length of w is greater than 2 P , then there 
always self-embedded nonterminals in any derivation tree for w. 

Let us now fix in our minds one specific derivation of w in G. Let us call one 
embedded nonterminal P, whose first production is P-^QR. 

Let us suppose that the tree for w looks like this: 


.IIS.111111....I.-IMS 
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The triangle indicated encloses the whole part of the tree generated from the first P 
down to where the second P is produced. 

Let us divide w into these five parts: 

u = the substring of all the letters of w generated to the left of the triangle above 
(this may be A) 

v = the substring of all the letters of w descended from the first P but to the left 
of the letters generated by the second P (this may be A) 

* = the substring of w descended from the lower P (this may not be A because 
this nonterminal must turn into some terminals) 
y = the substring of w of all letters generated by the first P but to the right of the 
letters descending from the second P (this may be A, but as we shall see, 
not if v = A) 

z = the substring of all the letters of w generated to the right of the triangle 
(this may be A) 

Pictorially, 



For example, the following is a complete tree in an unspecified grammar: 
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It is possible that either u or z or both might be A, as in the following example where 
is itself the self-embedded nonterminal and all the letters of w are generated inside the 
angle: 



m 


v — A x = ba y = CL z = A 


However, either v is not A, y is not A, or both are not A. This is because in the pictu 


W: 





n 


even though the lower P can come from the upper Q or from the upper R, there must still 
some other letters in w that come from the other branch, the branch that does not produ 
this P. 

This is important, because if it were ever possible that 

y = y = A 


* 

I 

£ ■ ■ 


would not be an interesting collection of words. 

Now let us ask ourselves, what happens to the end word if we change the derivation tr 
by reiterating the productions inside the triangle? In particular, what is the word general 
by this doubled tree? 






q r \ y 

\ ) 


: 

Sr;\ 


As we see can from the picture, we shall be generating the word 
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uvvxyyz 

Remember that u, v, x, y, and z are all strings of a’s and b’ s, and this is another word 
generated by the same grammar. The w-part comes from S to the left of the whole triangle. 
The first v is what comes from inside the first triangle to the left of the second P. The second 
v comes from the stuff in the second triangle to the left of the third P. The x-part comes from 
the third P. The first y-part comes from the stuff in the second triangle to the right of the 
third P. The second y comes from the stuff in the first triangle to the right of the second P. 
The z, as before, comes from 5 from the stuff to the right of the first triangle. 

If we tripled the triangle, we would get 



u v v v x y y y z 


which is a derivation tree for the word 

uvvvxyyyz 

which must therefore also be in the language generated by G. 

In general, if we repeat the triangle n times, we get a derivation tree for the word 

uv n xy n z 

which must therefore also be in the language generated by G. ■ 

We can also use our ==> symbol to provide an algebraic proof of this theorem. 

S ==> uPz => uvPyz ===> uvxyz = w 

This new symbol is the nexus of two of our old concepts: the derivation => and the clo¬ 
sure *, meaning as many repetitions as we want. The idea of “eventually producing” was in¬ 
herent in our concept of nullable. Using our new symbolism, we can write 

N is nullable ifA^> A 

We can also give an algebraic definition of self-embedded nonterminals. 

DEFINITION (Second) 


In a particular CFG, a nonterminal N is called self-embedded in the derivation of a word w 
if there are strings of terminals v and y not both null, such that 

N^vNy M 
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PROOF 2 


If P is a self-embedded nonterminal in the derivation of w, then 


for some u and z, both substrings of w. Also. 


for some v and y , both substrings of w, and finally, 


another substring of w. 
But we may also write 


5 => uPz 
=> uvPyz 
==> uvvPyyz 
=> uvvvPyyyz 
=> uv n Py n z ( 
^ wfxfz 


So, this last set of strings are all words derivable in the original CFG 


Some people are more comfortable with the algebraic argument and some are more 
comfortable reasoning from diagrams. Both techniques can be mathematically rigorous and: 
informative. There is no need for a blood feud between the two camps. 


EXAMPLE 


We shall analyze a specific case in detail and then consider the situation in its full generality. 
Let us consider the following CFG in CNF: 


The word ctbab can be derived from these productions by the following derivation tree: 
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Here, we see three instances of self-embedded nonterminals. The top S has another S as 
a descendant. The Q on the second level has two Q 's as descendants, one on the third level 
and one of the fourth level. Notice, however, that the two P’s are not descended one from the 
other, so neither is self-embedded. For the purposes of our example, we shall focus on the 
self-embedded Q 's of the second and third levels, although it would be just as good to look 
at the self-embedded S”s. The first Q is replaced by the production Q —* QS , whereas the sec¬ 
ond is replaced by the production Q^b. Even though the two <2’s are not replaced by the 
same productions, they are self-embedded and we can apply the technique of this theorem. 

If we draw this derivation: 

S^>PQ 

=>aQ 

aQS 

=^abS 

=>abPQ 

=>abaQ 

=*abab 

we can see that the word w can be broken into the five parts uvxyz as follows: 


s 



u x y 


We have located a self-embedded nonterminal Q and we have drawn a triangle enclos¬ 
ing the descent from Q to Q. The u -part is the part generated by the tree to the left of the 
triangle. This is only the letter a. The v-part is the substring of w generated inside the trian¬ 
gle to the left of the repeated nonterminal. Here, however, the repeated nonterminal Q is the 
leftmost character on the bottom of the triangle. Therefore, v = A. The jc-part is the sub¬ 
string of w descended directly from the second occurrence of the repeated nonterminal (the 
second Q). Here, that is clearly the single letter b. The y-part is the rest of w generated in¬ 
side the triangle, that is, whatever comes from the triangle to the right of the repeated non¬ 
terminal. In this example, this refers to the substring ab. The z-part is all that is left of w , 
that is, the substring of w that is generated to the right of the triangle. In this case, that is 
nothing, z = A. 

u = a , v = A, x — b, y = ab, z = A 

The following diagram shows what would happen if we repeated the triangle from the 
second Q just as it descends from the first Q : 
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If we now fill in the picture by adding the terminals that descend from the P, Q, and S s, 
as we did in the original tree, we complete the new derivation tree as follows: 



1 




b a b a b 


Here, we can see that the repetition of the triangle does not affect the w-part. There was 
one w-part and there still is only one w-part. If there were a z-part, that too would be left 
alone, because these are defined outside the triangle. There is no v-part in this example, but 
we can see that the y-part (its right-side counterpart) has become doubled. Each of the tw< 
triangles generates exactly the same y-part. In the middle of all this, the jt-part has been left 
alone. There is still only one bottom repeated nonterminal from which the *-part descends. 
The word with this derivation tree can he written as uvvxyyz : 

uvvxyyz = aAAhababA 
= ababab 

If we had tripled the triangle instead of only doubling it, we would obtain 


l: 

Si 






P Q 




Q _ S 



/ V l\. 


S p Q 


Q S J 

> Q 


/\ 



1 1 



1 1 

hub 




This word we can easily recognize as 

uvvvxyyyz — aAAAbahababA 

In general, after n iterations of the triangle, we obtain a derivation of the word 


uv n xy n z 


We draw one last generalized picture: 


K : 


The Pumping Lemma for CFLs 



u v x y z 


Pumped twice, it becomes 



As before, the reason this is called the pumping lemma and not the pumping theorem is 
that it is to be used for some presumedly greater purpose. In particular, it is used to prove 
that certain languages are not context-free or, as we shall say, they are non-context-free. 


EXAMPLE 


Let us consider the language 

{a n b n a n for /i = 1 2 3...} 

= [aba aabbaa aaabbbaaa . . .} 

Let us think about how this language could be accepted by a PDA. As we read the first a’s, 
we must accurately store the information about exactly how many a’s there were, because 
a m b"a" must be rejected but a"b"a 99 must be accepted. We can put this count into the 
STACK. One obvious way is to put the a’s themselves directly into the STACK, but there may 
be other ways of doing this. Next, we read the b's and we have to ask the STACK whether or 
not the number of b's is the same as the number of a’s. The problem is that asking the STACK 
this question makes the STACK forget the answer afterward, because we pop stuff out and can¬ 
not put it back. There is no temporary storage possible for the information that we have popped 
out. The method we used to recognize the language { a n b n } was to store the a’s in the STACK 
and then destroy them one for one with the b's. After we have checked that we have the correct 
number of b's, the STACK is empty. No record remains of how many a’s there were originally. 
Therefore, we can no longer check whether the last clump of a’s in a n b n a n is the correct size. In 
answering the question for the b's, the information was lost. This STACK is like a student who 
forgets the entire course after the final exam. 
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uv 2 xy 2 z 

has more than one such substring, which no word in { a n b"a n } does. Therefore, neither v nor 
y contains ba. 

Conclusion 

The only possibility left is that v and y must be all a' s, all b\ or A. Otherwise, they would 
contain either ab or ba. But if v and y are blocks of one letter, then 

uv 2 xy 2 z 

has increased one or two clumps of solid letters (more a's if v is a' s, etc.). However, there 
are three clumps of solid letters in the words in { a n b n a n }, and not all three of those clumps 
have been increased equally. This would destroy the form of the word. 

For example, if 

a 2W h 200 a 200 = *200*70 *40 *90^82 fl llS 

u v x y z 

then 

uv 2 xy 2 z — (fl 200 6 70 )(fe 40 ) 2 (^ 90 (3! 82 ) (fl 3 ) 2 (tf 115 ) 

= ^200*240^203 

+ a n b n a n for any n 

The b's and the second clump of a's were increased, but not the first a 's so the expo¬ 
nents are no longer the same. 

We must emphasize that there is no possible decomposition of this w into uvxyz. It is not 
good enough to show that one partition into five parts does not work. It should be understood 
that we have shown that any attempted partition into uvxyz must fail to have uvvxyyz in the 
language. 

Therefore, the pumping lemma cannot successfully be applied to the language (aW } 
at all. But the pumping lemma does apply to all context-free languages. 

Therefore, { a n b n a n } is not a context-free language. ■ 


EXAMPLE 

Let us take, just for the duration of this example, a language over the alphabet X = {a b c }. 
Consider the language 

la n b n c n for rc=l 2 3 ...} 

= (abc aabbcc aaabbbccc . . .} 

We shall now prove that this language is non-context-free. 

Suppose it were context-free and suppose that the word 

w = 200 * 200^,200 

is large enough so that the pumping lemma applies to it. (That means larger than 2 P , where p 
is the number of live productions.) We shall now show that no matter what choices are made 
for the five parts u, v, x, y, z, 

uv 2 xy 2 z 

cannot be in the language. 

Again, we begin with an observation. 
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Observation I 

All words in a n b n c n have: * 

' - 

Only one substring ab 
Only one substring be 
No substring ac 
No substring ba 

No substring ca ' 

No substring cb 

no matter what n is. 

Conclusion 

If v or y is not a solid block of one letter (or A), then 

uv 2 xy 2 z .'.“3 

would have more of some of the two-letter substrings cib, ac, ba, be, ca, cb than it is sup- j 

posed to have. On the other hand, if v and y are solid blocks of one letter (or A), then one or 
two of the letters a, b, c would be increased in the word uvvxyyz, whereas the other letter (or -| 
letters) would not increase in quantity. But all the words in a n b n c n have equal numbers of a s, 
b% and c’s. Therefore, the pumping lemma cannot apply to the language [a n b n c n \, which | 
means that this language is non-context-free. P 

Theorem 34 and Theorem 13 (initially discussed on pp. 360 and 190, respectively) have’ll 
certain things in common. They are both called a “pumping lemma, and they were both 
proven by Bar-Hillel, Perles, and Shamir. What else? ; 


THEOREM 13 

It w is a word in a regular language L and w is long enough, then w can be decomposed into ^ 
three parts: w = xyz, such that all the words xy n z must also be in L. 


THEOREM 34 % 

If w is a word in a context-free language L and w is long enough, then w can be decomposed | 
into five parts: w = uvxyz, such that all the words uv n xy n x must also be in L. 


The proof of Theorem 13 is that the path for w must be so long that it contains a se- 
quence of edges that we can repeat indefinitely. The proof of Theorem 34 is that the deriva- j 
tion for w must be so long that it contains a sequence of productions that we can repeat in- jj 

definitely. f 

We use Theorem 13 to show that { a n b n } is not regular because it cannot contain both *vr | 
and *yyz. We use Theorem 34 to show that { a n b n a n } is not context-free because it cannot i 
contain both uvxyz and uvvxyyz. 

One major difference is that the pumping lemma for regular languages acts on the ma¬ 
chines, whereas the pumping lemma for context-free languages acts on the algebraic repre¬ 
sentation, the grammar. 

There is one more similarity between the pumping lemma for, qontext-free languages 
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and the pumping lemma for regular languages. Just as Theorem 13 required Theorem 14 to 
finish the story, so Theorem 34 requires Theorem 35 to achieve its full power. 

Let us look in detail at the proof of the pumping lemma. We start with a word w of more 
than 2 P letters. The path from some bottom letter back up to S contains more nonterminals 
than there are live productions. Therefore, some nonterminal is repeated along the path. Here 
is the new point: If we look for the first repeated nonterminal backing up from the letter, the 
second occurrence will be within p steps up from the terminal row (the bottom). Just because 
we said that length(w) > 2 P does not mean it is only a little bigger. Perhaps length(w') = \0 P . 
Even so, the upper of the first self-embedded nonterminal pair scanning from the bottom en¬ 
countered is within p steps of the bottom row in the derivation tree. 

What significance does this have? It means that the total output of the upper of the two 
self-embedded nonterminals produces a string not longer than 2 P letters in total. The string it 
produces is vxy. Therefore, we can say that 

length(vxy) < 2 P 

This observation turns out to be very useful, so we call it a theorem: the pumping 
lemma with length. 


THEOREM 35 


Let L be a CFL in CNF with p live productions. 

Then any word w in L with length > 2 P can be broken into five parts: 


such that 


and such that all the words 


w = uvxyz 

length(vxy) < 2 P 
length^) > 0 
length(v) + length(y) > 0 


uvvxyyz 

uvvvxyyyz 

uvvvvxyyyyz 


uv n xy n z 


are in the language L. U 

The discussion above has already proven this result. 

We now demonstrate one application of a language that cannot be shown to be non¬ 
context-free by Theorem 34, but can be by Theorem 35. 


EXAMPLE 

Let us consider the language 

L = [a n b m a n b m \ 

where n and m are integers 1, 2, 3, . . . and n does not necessarily equal m. 

L - {abab aabaab abbabb aabbaabb aaabaaab . . .} 
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If we tried to prove that this language was non-context-free using Theorem 34 (p. 360) 
we could have 

u = A 

v - first a’s = a s 
x = middle b 's = b‘ 
y = second a’s = a s 
z = last b's = # 
uv n xy n z = MaybXaTb' 

all of which are in L. Therefore, we have no contradiction and the pumping lemma does ap, 

ply to L. H 

Now let us try the pumping lemma with length approach. If L did have a CFG that gen¬ 
erates it, let that CFG in CNF have p live productions. Let us look at the word 

/W 1 

This word has length long enough for us to apply Theorem 35 to it. But from Theorem 3^ 
we know that 

length(vxy) < 2 P 

so v and y cannot be solid blocks of one letter separated by a clump of the other letter, bop 
cause the separator letter clump is longer than the length of the whole substring vxy. 

By the usual argument (counting substrings of “ab” and “ba”), we see that v and y must■; 
be one solid letter. But because of the length condition, all the letters must come from the 
same clump. Any of the four clumps will do. 

However, this now means that uvvxyyz is not of the form 

a n b m a n b m 

but must also be in L. Therefore, L is non-context-free. 


EXAMPLE 


Let us consider the language 
DOUBLEWORD 


= {ss where s is any string of a’s and b’s J 
= {A aa bb aaaa abab baba bbbb . . .} 


In Chapter 10, p. 200, we showed that DOUBLEWORD is nonregular. Well even more isjj 
true. DOUBLEWORD is not even context-free. We shall prove this by contradiction. 

If DOUBLEWORD were generated by a grammar with p live productions, then aiggj 
word with length greater than 2 P can be pumped, that is, decomposed into five strings uvxyz 
such that uvvxyyz is also in DOUBLEWORD and length(vxy) <2 P . 

Let n be some integer greater than 2 P and let our word to be pumped be 

w = a n b n a n b n 

which is clearly in DOUBLEWORD and more than long enough. Now because length(vx^l 
is less than 2 P , it is also less than n. If the vxy section is contained entirely in one solid letter 
clump, then replacing it with vvxyy will increase only one clump and not the others, thus 
breaking the pattern and the pumped word would not be in DOUBLEWORD. Therefore, ^ 
can conclude that the vxy substring spans two clumps of letters. Notice that it cannot be long ^ 


w" 


g, 

IIP' 

§& 
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enough to span three clumps. This means the v.ry contains a substring ab or a substring ba. 
When we form uvvxyyz, it may then no longer be in the form a*b*a*b*, but it might still be 
in DOUBLEWORD. However, further analysis will show that it cannot be. 

It is possible that the substring ab or ba is not completely inside any of the parts v, x, or 
y but lies between them. In this case, uvvxyyz leaves the pattern a*b*a*b* but increases 
two consecutive clumps in size. Any way of doing this would break the pattern of ss of 
DOUBLEWORD. This would also be true if the ab or ba were contained within the jc-part. 
So, the ab or ba must live in the v- ory-part. 

Let us consider what would happen if the ab were in the v-part. Then v is of the form 
a + b + . So, vxy would lie between some a" and b n . Because the v-part contains the substring 
ab, the xy-part would lie entirely within the b". (Notice that it cannot stretch to the next a n 
since its total length is less than n .) Therefore, x and y are both strings of b’s that can be ab¬ 
sorbed by the b* section on their right. Also, v starts with some a’s that can be absorbed by 
the a* section on its left. Thus, uvvxyyz is of the form 

a* a + b + a + b + b* b*a n lf 

u vv xyy z 
= a*b + a + b + a + b + 

If S begins and ends with different letters, then SS has an even number of a clumps and 
an even number of b clumps. If S begins and ends with the same letter, then SS will have an 
odd number of clumps of that letter but an even number of clumps of the other letter. In 
any case, SS cannot have an odd number of clumps of both letters, and this string is not in 
DOUBLEWORD. 

The same argument holds if the ab or ba substring is in the y-part. Therefore, w cannot 
be pumped and therefore DOUBLEWORD is non-context-free. ■ 

PROBLEMS 

1. Study this CFG for EVENPALINDROME: 

S~*aSa 
S—>bSb 
S-+ A 

List all the derivation trees in this language that do not have two equal nonterminals 

on the same line of descent, that is, that do not have a self-embedded nonterminal. 

2. Consider the CNF for NONNULLEVENPALINDROME given below: 

S—*BY 

Y~* SB 

S—*AA 

S—+BB 

A—*a 

B—*b 

(i) Show that this CFG defines the language it claims to define. 

(ii) Find all the derivation trees in this grammar that do not have a self-embedded non¬ 
terminal. 

(iii) Compare this result with Problem 1. 

3. The grammar defined in Problem 2 has six live productions. This means that the second 
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I 


theorem of this section implies that all words of more than 2 6 = 64 letters must have S 
self-embedded nonterminal. Find a better result. What is the smallest number of letters 
that guarantees that a word in this grammar has a self-embedded nonterminal in each of 
its derivations. Why does the theorem give the wrong number? 

4. Consider the grammar given below for the language defined by a*ba*: 

S-MM 3 

A^Aa | A 

(i) Convert this grammar to one without A-productions. 

(ii) Chomsky-ize this grammar. 

(iii) Find all words that have derivation trees that have no self-embedded nonterminals. 

5. Consider the grammar for {a n b n }: 

S—*aSb | ab 

(i) Chomsky-ize this grammar. 

(ii) Find all derivation trees that do not have self-embedded nonterminals. 

6. Instead of the concept of live productions in CNF, let us define a live nonterminal to b 
one appearing at the left side of a live production. A dead nonterminal N is one wi 
only productions of the single form 

N —* terminal 

If m is the number of live nonterminals in a CFG in CNF, prove that any word w o 
length more than 2 m will have self-embedded nonterminals. 

7. Illustrate the theorem in Problem 6 on the CFG in Problem 2. 

8. Apply the theorem of Problem 6 to the following CFG for NONNULLPALINDROME: 

S-^AX S—*a 

X-^SA S~* b 

S^BY A—*a 

Y—>SB B—*b I 


9. Prove that the language 


{a n b n a n b n for n = 1 
= {abab aabbaabb 


4 . . .) 


is non-context-free. 


10. Prove that the language 

{a n b n a n b n a n for n = 1 2 3 4...} 

= {ababa aabbaabbaa . . .} 

is non-context-free. 

11. Let L be the language of all words of any of the following forms: 

\a n a n b n a n b n a n a n b n a n tf a n b n a n b n a n ... forn=l 2 3 . . .} 

= {a aa ab aaa aba aaaa aabb aaaaa ababa aaaaaa aaabbb 
aabbaa . . .} 

(i) How many words does this language have with 105 letters? 

(ii) Prove that this language is non-context-free. 


IS 


jp 

■ 

m 

jgg - 


I: 
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12. Is the language 

[a n b in a n for n= 1 2 3...} 

= {abbba aabbbbbbaa . . .} 

context-free? If so, find a CFG for it. If not, prove so. 

13. Consider the language 

{a n b n c m for n,m=l 2 3 . . . , n mot necessarily = m } 

= [abc abcc aabbc abccc aabbcc . . .} 

Is it context-free? Prove that your answer is correct. 

14. Show that the language 

[a n b n c n d n for « — 1 2 3...} 

= {abed aabbeedd . . .} 

is non-context-free. 

15. Why does the pumping lemma argument not show that the language PALINDROME is 
not context-free? Show how v and y can be found such that uv n xfz are all also in 
PALINDROME no matter what the word w is. 

16. Let VERYEQUAL be the language of all words over X = {a b c } that have the same 
number of a' s, b’ s, and c’s. 

VERYEQUAL = [abc acb bac bca cab eba aabbcc aabebe . . .} 

Notice that the order of these letters does not matter. Prove that VERYEQUAL is non- 
context-free. 

17. The language EVENPALINDROME can be defined as all words of the form 

s reverse^) 

where 5 is any string of letters from (a + b)*. Let us define the language UPDOWNUP 
as 

L ~ (all words of the form .v(reverse(,s)) s where s is in (a + b)*} 

— {aaa bbb aaaaaa abbaab baabba bbbbbb . . . aaabbaaaaaab . . .} 

Prove that L is non-context-free. 

18. Using an argument similar to the one on p. 195, show that the language 

PRIME = { a p where p is a prime} 

is non-context-free. 

19. Using an argument similar to the one for Chapter 10, Problem 6(i), prove that 

SQUARE = { a n where n = 1 2...} 

is non-context-free. 

20. Problems 18 and 19 are instances of one larger principle. Prove: 

Theorem 

If L is a language over the one-letter alphabet X = [a] and L can be shown*to be non¬ 
regular using the pumping lemma for regular languages, then L can be shown to be non¬ 
context-free using the pumping lemma for CFLs. 
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CHAPTER 17 


Context-Free 

Languages 


CLOSURE PROPERTIES 

In Part I, we showed that the union, the product, the Kleene closure, the complement, and 
the intersection of regular languages are all regular. We are now at the same point in our dis- 
cussion of context-free languages. In this section, we prove that the union, the product, and 
the Kleene closure of context-free languages are context-free. What we shall not do is show 
that the complement and intersection of context-free languages are context-free. Rather, we 
show in the next section that this is not true in general. 


THEOREM 36 

If L, and L 2 are context-free languages, then their union, L, + L v is also a context-free lan¬ 
guage. In other words, the context-free languages are closed under union. 


PROOF 1 (by grammars) 

This will he a proof by constructive algorithm, which means that we shall show how to cre¬ 
ate the grammar for L, + L 2 out of the grammars for L, and L 2 . 

Because L { and L 2 are context-free languages, there must be some CFGs that generate 
them. 


Let the CFG for L { have the start symbol S and the nonterminals A, B, C, 


Let us 


change this notation a little by renaming the start symbol 5, and the nonterminals A V B V C P 
. . . . All we do is add the subscript 1 onto each character. For example, if the grammar 
were originally 

S~*aS | 55 | AS | A 

A—>AA | b \ 

it would become 


S l -*aS l | SjS, j A^ | A 

Aj —^►AjA, | b 

where the new nonterminals are 5 t and A,. 

Notice that we leave the terminals alone. Clearly, the language generated by this CFG from 
S t is the same as before, because the added 1 ’s do not affect the strings of terminals derived. 

Let us do something comparable to a CFG that generates L 2 . We add a subscript 2 to 
each nonterminal symbol. For example, 

S—+AS | SB | A 
A—*aA j a 
B—*bB | b 

becomes 

S 2 —*A 2 S 2 | S 2 B 2 | A 
A 2 ~*aA 2 | a 
B 2 —> bB 2 j b 

Again, we should note that this change in the names of the nonterminals has no effect 
on the language generated. 

Now we build a new CFG with productions and nonterminals that are those of the 
rewritten CFG for L, and the rewritten CFG for L v plus the new start symbol 5 and the addi¬ 
tional production 

5—*5j ] S 2 

Because we have been careful to see that there is no overlap in the use of nonterminals, once 
we begin 5 —*■ S v we cannot then apply any productions from the grammar for L T All words 
with derivations that start 5—>5, belong to L v and all words with derivations that begin 
S-+S 2 belong to L r 

All words from both languages can obviously be generated from S. Because we have cre¬ 
ated a CFG that generates the language L, + L 2 , we conclude it is a context-free language. 

■ 


EXAMPLE 

Let Lj be PALINDROME. One CFG fori, is 

S—>aSa | bSb | a \ b | A 
Let L 2 be { a n b n ). One CFG for L 2 is 

S—*aSb | A 

Theorem 36 recommends the following CFG for L, + L 2 : 

s -*s x \s 2 

5, ► aS y a j bS x b \ a \ b | A 

S 2 ~* aS 2 b | A | 

No guarantee was made in this proof that the grammar proposed for I, + L 2 was the 
simplest or most intelligent CFG for the union language, as we can see from the following. 
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One CFG for the language EVENPALINDROME is 

S^aSa | bSb | A 

One CFG for the language ODDPALINDROME is 

S—*aSa | bSb | a | b 

Using the algorithm of the preceding proof, we produce the following CFG for PALIN DROMEf 

PALINDROME = EVENPALINDROME + ODDPALINDROME 

s —*s l | S 2 
5, -~^aS l a | bS x b | A 
S 2 —* aS 2 a | bS 2 b | a | b 

We have seen more economical grammars for this language before. ■ 

No stipulation was made in this theorem that the set of terminals for the two languages; 
had to be the same. 


EXAMPLE 

Let L, be PALINDROME over the alphabet X, = [a b), whereas let L 2 be [c n d n ] over the 
alphabet S 2 = \c d }. Then one CFG that generates L, 4- L 2 is 

S —Si | S 2 

S x —► aS { a | bS { b \ a | b | A 
S 2 —* cS 2 d j A 

This is a language over the alphabet [abed). ■ 

In the proof of Theorem 36, we made use of the fact that context-free languages are 
generated by context-free grammars. However, we could also have proven this result using 
the alternative fact that context-free languages are those accepted by PDAs. 


PROOF 2 (by machines) 

Because L, and L 2 are context-free languages, we know (from the previous chapter) that; 
there is a PDA { that accepts L, and a PDA 2 that accepts L r 

We can construct a PDA 2 that accepts the language of L, + L 2 by amalgamating the 
START states of these two machines. This means that we draw only one START state and 
from it come all the edges that used to come from either prior START state. 

In PDA X In PI)A 2 

{ START ) C START J 

Y=-X ) < 


becomes 


/7 Y\ 

Once an input string starts on a path on this combined machine, it follows the path ei¬ 
ther entirely within PDA X or entirely within PDA 2 because there are no cross-over edges. 

Any input reaching an ACCEPT state has been accepted by one machine or the other 
and so is in L { or L r Also, any word in L, + L 2 can find its old path to acceptance on the sub¬ 
part of PDA 3 that resembles PDA { or PDA r ■ 

Notice how the nondeterminism of the START state is important in the proof above. We 
could also do this amalgamation of machines using a single-edge START state by weaseling 
our way out, as we saw in Chapter 14. 


EXAMPLE 


Consider these two machines: 

PDA j 



PDA o 


^ START ^ 



READ 


Q ACCEPT ^ 


PDA X accepts the language of all words that contain a double a. PDA 2 accepts all words 
that begin with an a. The machine for L, + L 2 is 

T>r* a 


k 3 
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Notice that we have drawn PDA 3 with only one ACCEPT state by combining the 
ACCEPT states from PDA, and PDA r 

This was not mentioned in the algorithm in the proof, but it only simplifies the picture 
without changing the substance of the machine. 


THEOREM 37 

If L, and L 2 are context-free languages, then so is L { L r In other words, the context-free lan¬ 
guages are closed under product. 


PROOF I (by grammars) 

Let CFG, and CFG 2 be context-free grammars that generate L, and L v respectively. Let us 
begin with the same trick we used last time: putting a 1 after every nonterminal in CFG, (in¬ 
cluding S ) and a 2 after every nonterminal in CFG r 

Now we form a new CFG using all the old productions in CFG, and CFG 2 and adding 
the new START symbol S and the production 


Any word generated by this CFG has a front part derived from S, and a rear derived ^ 
from S r The two sets of productions cannot cross over and interact with each other because 
the two sets of nonterminals are completely disjoint. It is therefore in the language F,L 2 . 

The fact that any word in L { L 2 can be derived in this grammar should be no surprise 


(We have taken a little liberty with mathematical etiquette in our use of the phrase 
“. . . should be no surprise.” It is more accepted practice to use the cliches “obviously 
. . . ,” or “clearly . . . or “trivially. . . .” But it is only a matter of style. A proof 
only needs to explain enough to be convincing. Other virtues a proof might have are that: 
it be interesting, lead to new results, or be constructive. The proof above is at least the 
latter.) 


EXAMPLE 


Let L, be PALINDROME and CFG, be 


Let L 7 be { a n b n } and CFG 2 be 


S—*aSa | bSb \ a | b | A 


S—»aSb I A 



■ Sjjjj 


The algorithm in the proof recommends the CFG 
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5 

5, —* aS { a | bSfi | a | b | A 
S 2 —*aS 2 b | A 

for the language L,L 2 . m 

(?) PROOF 2 (by machines) 

For the previous theorem we gave two proofs: one grammatical and one mechanical. There 
is an obvious way to proceed to give a machine proof for this theorem too. The front end of 
the word should be processed by one PDA and the rear end of the word processed on the 
second PDA. Let us see how this idea works out. 

If we have PDA, that accepts L, and PDA 2 that accepts L 2 , we can try to build the ma¬ 
chine PDA 3 that accepts L,L 2 as follows. 

Draw a black dot. Now take all the edges of PDA, that feed into any ACCEPT state and 
redirect them into the dot. Also take all the edges that come from the START state of PDA 2 
and draw them coming out of the dot. Erase the old PDA, ACCEPT and the old PDA 7 
START states. 


ACCEPT 


becomes 



This kind of picture is not legal in a pushdown automaton drawing because we did not 
list “a black dot” as one of the pieces in our definition of PDA. The black dot is not neces¬ 
sary. We wish to connect every state that leads to ACCEPT-PDA, to every state in PDA 2 that 
comes from START -PDA r We can do this by edges drawn directly pointing from one ma¬ 
chine to another. Alternately, the edges from PDA, can lead into a new artificial state: PUSH 
OVER, which is followed immediately by POP OVER whose nondeterministic edges, all la¬ 
beled OVER, continue to PDA T Let us call this the black dot. 

For an input string to be accepted by the new PDA, its path must first reach the black 
dot and then proceed from the dot to the ACCEPT states of PDA r There is no path from the 
START (of PDA,) to ACCEPT (of PDA 2 ) without going through the dot. The front substring 
with a path that leads up to the dot would be accepted by PDA,, and the remaining substring 
with a path that leads from the dot to ACCEPT would be accepted by PDA r Therefore, all 
words accepted by this new machine are in the language L,L 2 . 

It is also obvious that any word in L,L 2 is accepted by this new machine. 
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Not so fast. 

We did not put an end-of-proof mark, ■, after the last sentence because the proof actually is 
not valid. It certainly sounds valid. But it has a subtle flaw, which we shall illustrate. 

When an input string is being run on PDA } and it reaches ACCEPT, we may not have 
finished reading the entire INPUT TAPE. The two PDAs that were given in the preceding ex¬ 
ample (which we have redrawn below) illustrate this point perfectly. In the first, we reach the 
ACCEPT state right after reading a double a from the INPUT TAPE. The word baabbb will 
reach ACCEPT on this machine while it still has three b\ unread. 

The second machine presumes that it is reading the first letter of the L 2 part of the string 
and checks to be sure that the very first letter it reads is an a. 

If we follow the algorithm as stated earlier, we produce the following. From 






The resultant machine will reject the input string ( baahbb){aa ) even though it is in the 
language L,L 2 because the black dot is reached after the third letter and the next letter it 
reads is a h , not the desired a , and the machine will crash. Only words containing aaa are ac¬ 
cepted by this machine. 

For this technique to work, we must insist that PDA V which accepts L v have the prop¬ 
erty that it reads the whole input string before accepting. In other words, when the ACCEPT 
state is encountered, there must be no unread input left. What happens if we try to modify 
PDA { to meet this requirement? Suppose we use PDA t version 2 as on the next page, which 
employs a technique from the proof of Theorem 29 (p. 311): 
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This machine does have the property that when we get to ACCEPT, there is nothing left 
on the TAPE. This is guaranteed by the READ loop right before ACCEPT. However, when 
we process the input (baabbb)(aa), we shall read all eight letters before reaching ACCEPT 
and there will be nothing left to process on PDA 2 because we have insisted that the TAPE be 
exhausted by the first machine. Perhaps it is better to leave the number of letters read before 
the first ACCEPT up to the machine to decide nondeterministically. 

If we try to construct PDA 3 as shown below using the modified PDA V with a nondeter- 
ministic feed into the black dot, we have another problem. 



This conglomerate will accept the input ( baabbb)(bba ) by reading the first two b's of 
the second factor in the PDA , part and then branching through the black dot to read the last 
letter on the second machine. However, this input string actually is in the language L X L V be¬ 
cause it is also of the form {babbbbb){a). 

So this PDA 3 version works in this particular instance, but does it work in all cases ? Are 
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we convinced that even though we have incorporated some nondeterminism, there are no un 
desirable strings accepted? 

As it stands, the preceding discussion is no proof. Luckily, this problem does not af 
feet the first proof, which remains valid. This explains why we put the “?” in front of the 
word “proof’ earlier. No matter how rigorous a proof appears, or how loaded with mathe 
matical symbolism, it is always possible for systematic oversights to creep in undetected. 
The reason we have proofs at all is to try to stop this. But we never really know. We can 
never be sure that human error has not made us blind to substantial faults. The best we 
can do, even in purely symbolic abstract mathematics, is to try to be very, very clear and 
complete in our arguments, to try to understand what is going on, and to try many exam¬ 
ples. 


THEOREM 38 

If L is a context-free language, then L* is one too. In other words, the context-free languages 
are closed under the Kleene star. 




PROOF 

Let us start with a CFG for the language L. As always, the start symbol for this language i|j 
the symbol S. Let us as before change this symbol (but no other nonterminals) to Sj through- : 
out the grammar. Let us then add to the list of productions the new production 

I A 

Now we can, by repeated use of this production, start with S and derive 






►SAW. 


Following each of these 5,’s independently through the productions of the original: 
CFG, we can form any word in L* made up of n concatenated words from L. To convince 
ourselves that the productions applied to the various separate word factors do not inter¬ 
fere in undesired ways, we need only think of the derivation tree. Each of these s is the 
root of a distinct branch. The productions along one branch of the tree do not affect those 
on another. Similarly, any word in L* can be generated by starting with enough copies of 


m 


EXAMPLE 


If the CFG is 

S^aSa | bSb \ a \ b \ A 

(which generates PALINDROME), then one possible CFG for PALINDROME* is 

S^XS | A 

X —* aXa | bXb j a ( b j A 

Notice that we have used the symbol X instead of the nonterminal S,, which was indicated i 
the algorithm in the proof. Of course, this makes no difference. 
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INTERSECTION AND COMPLEMENT 

Here is a pretty wishy-washy result. 

THEOREM 39 

The intersection of two context-free languages may or may not be context-free. 

PROOF 

We shall break this proof into two parts: may and may not. 

May 

All regular languages are context-free (Theorem 21, p. 259). The intersection of two regular lan¬ 
guages is regular (Theorem 12, p. 174). Therefore, if L, and L 2 are regular and context-free, then 

L\ n l 2 

is both regular and context-free. 

May Not 
Let 

L, = {a n b n a m , where n, m = 1 2 3 . . . , but n is not necessarily the same as m } 

= {aba abaa aabba . . .} 

To prove that this language is context-free, we present a CFG that generates it: 

S-^XA 
X-+ aXb | ab 
A—*aA | a 

We could alternately have concluded that this language is context-free by observing that it is 
the product of the CFL [a n b n ) and the regular language aa*. Let 

L 2 = { a n b m a m , where n, m = \ 2 3 , but n is not necessarily the same as m] 

= {aba aaba abbaa . . .} 

Be careful to notice that these two languages are different. 

To prove that this language is context-free, we present a CFG that generates it: 

S —» AX 
X~* bXa | ba 
A—| a 

Alternately, we could observe that L 2 is the product of the regular language aa* and the CFL 
[b n a"\. 

Both languages are context-free, but their intersection is the language 

L 3 = L, D L 2 = \a"b n a n for n = 1 2 3 . . .} 

because any word in both languages has as many starting a’s as middle b’s (to be in L,) and 
as many middle b’s as final a’s (to be in L 2 ). 

But on p. 367, we proved that this language is non-context-free. Therefore, the intersec¬ 
tion of two context-free languages can be non-context-free. ■ 
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EXAMPLE (May) 

If L, and L 2 are two CFLs and if L, is contained in L v then the intersection is L, ag 
which is still context-free, for example, 

L x ~ \a n for n = 1 2 3 . . . } 

L 2 = PALINDROME 


L. is contained in L * therefore. 


L, n L 2 = L, 


which is context-free. 

Notice that in this example we do not have the intersection of two regular languag 
since PALINDROME is nonregular. 


EXAMPLE (May) 


L { = PALINDROME 

L = language of a + b + a + = language of aa*bb*aa* 


In this case, 


L x ITL 2 


is the language of all words with as many final a’s as initial a’s with only b s in between 

l DL 2 = \a n b m a n n,m- 1 2 3 . . . , where n is not necessarily equal to m] 

= {aba abba aabaa aabbaa . . .} 

This language is still context-free because it can be generated by the grammar 

S~>aSa | B 
B-^bB | b 

or accepted by this PDA: 





First, all the front a’s are put into the STACK. Then the b 's are consumed and ignor 
Then we alternately READ and POP as until both the INPUT TAPE and STACK run out 
multaneously. 

Again note that these languages are not both regular (one is, one is not). 
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We mention that these two examples are not purely regular languages because the proof 
of the theorem as given might have conveyed the wrongful impression that the intersection 
of CFLs is a CFL only when the CFLs are regular. 

EXAMPLE (May Not) 

Let L { be the language 

EQUAL = all words with the same number of a’s and b’s 

We know this language is context-free because we have seen a grammar that generates it 
(p. 239): 

| aB 

A~*bAA | aS | a 
B-^aBB \bS\b 

Let L 2 be the language 

L 2 = { a n b m a n n, m = \ 2 3..., n = mor m\ 

The language L 2 was shown to be context-free in the previous example. Now 

L 3 = Lj D L 2 - [a n b ln a n fora = 1 2 3 . . .} 

= [abba aabbbbaa . . .} 

To be in L, = EQUAL, the fr-total must equal the a-total, so there are 2 n b’s in the mid¬ 
dle if there are n as in the front and the back. 

We use the pumping lemma of Chapter 16 to prove that this language is non-context-free. 
As always, we observe that the sections of the word that get repeated cannot contain the 
substrings ab or ba, because all words in L 3 have exactly one of each substring. This means 
that the two repeated sections (the v-part and y-part) are each a clump of one solid letter. If 
we write some word >v of L 3 as 

w = uvxyz 

then we can say of v and y that they are either all a’ s or all b’s or one is A. However, if one is 
solid a' s, that means that to remain a word of the form a n b m a’\ the other must also be solid a’ s 
because the front and back a’s must remain equal. But then we would be increasing both clumps 
of a’s without increasing the b’s, and the word would then not be in EQUAL. If neither v nor y 
have a’s, then they increase the b’s without the a’s and again the word fails to be in EQUAL. 
Therefore, the pumping lemma cannot apply to L v so L 3 is non-context-free. ■ 

The question of when the intersection of two CFLs is a CFL is apparently very interest¬ 
ing. If an algorithm were known to answer this question, it would be printed right here. In¬ 
stead, we shall move on to the question of complements. 

The story of complements is similarly indecisive. 

THEOREM 40 

The complement of a context-free language may or may not be context-free. 

PROOF 


The proof occurs in two parts. 
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May 

If L is regular, then L' is also regular and both are context-free. 


May Not 

This is one of our few proofs by indirect argument. 

Suppose the complement of every context-free language were context-free. Then if vvp 
started with two such languages, L, and L v we would know that L,' and L 2 ' are also context^ 
free. Furthermore, 

A + A 

would have to be context-free by Theorem 36 (p. 376). 

Not only that, but 

(W + w 

would also have to be context-free, as the complement of a context-free language. But, 

and so then the intersection of L, and L 2 must be context-free. But L, and L 2 are any arbitrary 
CFLs, and therefore all intersections of context-free languages would have to be contextljj 
free. But by the previous theorem, we know that this is not the case. 

Therefore, not all context-free languages have context-free complements. M- 

EXAMPLE (May) 

All regular languages have been covered in the proof above. There are also some nonregular 
but context-free languages that have context-free complements. One example is the language | 
of palindromes with an X in the center, PALINDROMEX. This is a language over the alpha¬ 
bet X={a h X): 

= {wX reverse(w), where w is any string in (a + b)* } 

= {X aXa bXh aaXaa abXba baXab bbXbb . . .} 

This language can be accepted (as we have seen in Chapter 14 p. 301) by a deterministic 
PDA such as the one below: 


m 
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Because this is a deterministic machine, every input string determines a unique path 
from START to a halt state, either ACCEPT or REJECT. We have drawn in all possible 
branching edges so that no input crashes. The strings not accepted all go to REJECT. In 
every loop, there is a READ statement that requires a fresh letter of input so that no input 
string can loop forever. (This is an important observation, although there are other ways to 
guarantee no infinite looping.) 

To construct a machine that accepts exactly those input strings that this machine rejects, 
all we need to do is reverse the status of the halt states from ACCEPT to REJECT and vice 
versa. This is the same trick we pulled on FAs to find machines for the complement lan¬ 
guage. 

In this case, the language L' of all input strings over the alphabet X — [a b X] that 
are not in L is simply the language accepted by 



We may wonder why this trick cannot be used to prove that the complement of any 
context-free language is context-free, because they all can be defined by PDAs. The answer 
is nondeterminism. 

If we have a nondeterministic PDA, then the technique of reversing the status of the halt 
states fails. In a nondeterministic PDA, a word may have two possible paths, the first of 
which leads to ACCEPT and the second of which leads to REJECT. We accept this word be¬ 
cause there is at least one way it can be accepted. Now if we reverse the status of each halt 
state, we still have two paths for this word: The first now leads to REJECT and the second 
now leads to ACCEPT. Again, we have to accept this word since at least one path leads to 
ACCEPT. The same word cannot be in both a language and its complement, so the halt- 
status-reversed PDA does not define the complement language. 

We still owe an example of a context-free language with a complement that is non¬ 
context-free. 

EXAMPLE (May Not) 

Whenever we are asked for an example of a non-context-free language, [a n b n a n \ springs to 
mind. We seem to use it for everything. Surprisingly enough, its complement is context-free, 
as we shall now show, by taking the union of seven CFLs. 
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This example takes several steps. First, let us define the language M pq as follows: 

M — { a p b q a'\ wher ep,q,r~ 1 2 3 . . . , but p > q while r is arbitrary} 

= [aaba aaaba aabaa aaabaa aaabba . . .} 

We know this language is context-free because it is accepted by the following CFG: | 

S-+AXA 
X —* aXb | ab 

A —*► a A | a J 

The X-part is always of the form a n b?\ and when we attach the A-parts, we get a string 
defined by the expression 

(aa*)(dW)(aa*) = a p b q a r , where p > q 

(Note: We are mixing regular expressions with things that are not regular expressions, but 
the meaning is clear anyway.) 

This language can be shown to be context-free in two other ways. We could observe that 
M pq is the product of the three languages a + , { a n b n }, and a + : 

M pq = (a + ) (a n b n ) (a + ) 

Because the product of two context-free languages is context-free, so is the product of 
three context-free languages. : f*| 

We could also build a PDA to accept it. The machine would have three READ state¬ 
ments. The first would read the initial clump of a' s and push them into the STACK. The sec- J 
ond would read b's and correspondingly pop a’s. When the second READ hits the first a of 
the third clump, it knows the b's are over, so it pops another a to be sure the initial clump of <j 
a's (in the STACK) was larger than the clump of b's. Even when the input passes this test, ^ 
the machine is not ready to accept. We must be sure that there is nothing else on the INPUTs^l 
TAPE but unread a's. If there is a b hiding behind these a's , the input must be rejected. We 
therefore move into the third READ state that loops as long as a s are read, crashes if a b is ; 
read, and accepts as soon as a blank is encountered. 

Let us also define another language: 

M } — { a p b q a r , where p,q^r — 1 2 3 . . . , but*/ > p while r is arbitrary} 

= {abba abbaa abbba abbaaa aabbba . . .} ^ 

This language too is context-free because it can be generated by 

S-+XBA 
X —► aXb | ab 
B-^bB | b 
A—>aA\a 

which we can interpret as 

X => a n b n p 

B^> b + fl 

==> a+ 

Together, this gives 

(a n b n ) (bb*) (aa*) = a p b q a r , where q > p 



Let us also define the language 
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M pr = [a p b q a r , where p, q, r - 1 2 3 . . . , but p> r while q is arbitrary} 

= {aaba aaaba aabba aaabaa . . .} 

This language is also context-free, because it can be generated by the CFG 

S-+AX 
X aXa | aBa 
B—+bB | b 
A—*aA | a 

First, we observe 

A =4 a + and B b + 

Therefore, the X-part is of the form 

a n bb*a n 


So, the words generated are of the form 

(aa*)(a"bb*a' ! ) = a p b q a r , where p > r 
Let us also define the language 

M rp — {aPb q a r y where/?, q, r = 1 2 3 . . . , but r >p while q is arbitrary} 

= \abaa abaaa aabaaa abbaaa . . .} 

One CFG for this language is 

S—»XA 
X—►aXa | aBa 
B-*bB | b 
A—>aA | a 

which gives 

A ==> a + 

B^> b + 

X ^ a n b + a n 
S =^> fo"bb*a M )(aa*) 

= a p b q a’\ where r > p 

We can see that this language too is the product of context-free languages when we 
show that [a n b + a n \ is context-free. 

Let us also define the language 

M qr = {a p b q a r , wherep, q, r = l 2 3 . . . , but q> r while p is arbitrary} 

= {abba aabba abbba abbbaa . . .} 

One CFG for this language is 

S—*ABX 
X-> bXa | ba 
B—*bB \ b 
A—>aA\a 

which gives 

(aa*)(bb*)(/?V0 = a p b q a\ where q > r 
M r = (a + )(b + )(b n a n ) 
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Let us also define 

M rq = {a p b q a r , where p,q,r = \ 2 3 . . . 

= [abaci aabaa abaaa abbaaa . . .} 


but r > q while p is arbitrary} 


One CFG that generates this language is 


S-+AXA 


► bXa | ba 
*aA I a 


which gives 


(aa*)(frV I )(aa*) = a p b q a\ where r > q 
M rq = (a + )(W)(a + ) 

We need to define one last language. 

M = { the complement of the language defined by aa*bb*aa*} 

= {all words not of the form cPlflcf for p, q,r = \ 2 3 . . .} 

= [a b aa ab ba bb aaa aab abb baa bab . . .} g^gjj 

M is the complement of a regular language and therefore is regular by Theorem 11 (p. 1 
all regular languages are context-free by Theorem 21 (p. 259). 

Let us finally assemble the language L, the union of these seven languages: 

++M„ + M r + M^ + U 

L is context-free because it is the union of context-free languages (Theorem 36, p. 376): 
What is the complement of L? All words that are not of the form 

a p b q a r 

are in M, which is in L, so they are not in V. This means that L' contains only words of 
form 

a p b q a r 

But what are the possible values of p, q, and r? If p > q, then the word is in M pq , so it! 
in L and not L' . Also, if q > p, then the word is in M qp , so it is in L and not L' . Therefore, 
p — q for all words in L\ 

If q> r, then the word is in M qr and hence in L and not L' . If r > q, the word is in 
and so in L and not L'. Therefore, q — r for all words in L'. 

Because p = q and q = r, we know that p = r. Therefore, the words 

a"b n a n 

are the only possible words in V. All words of this form are in L' because none of them 
any of the A/’s. Therefore, 

L' = {a n b n a n for/2-1 2 3...} % 

But we know that this language is non-context-free from Chapter 16. Therefore, % 
have constructed a CFL, L , that has a non-context-free complement. 

We might observe that we did not need M pr and M rp in the formation of L. The union of 
the other five alone completely defines L. We included them only for the purpose of synK 
metry. 
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The fact that the complement of a CFL can be non-context-free is the reason that PDAs 
cannot be defined as deterministic if they are to correspond to all CFLs. Roughly speaking, 
we can operate on any deterministic machine and reverse its ACCEPT and REJECT condi¬ 
tions to convert it into a machine that accepts the complement of the language that was origi¬ 
nally accepted. This halt-state reversal was illustrated in the Example (May) section of the 
preceding proof. Therefore, no deterministic pushdown automaton (DPDA) could accept 
the language {a n b n a n )' because its complement, a"b n a n , would then be accepted by some 
other (derived) PDA, but this complement is non-context-free. Yet, because ( a n b n a n )' can be 
generated by a CFG, we want it to be accepted by some PDA. This is why we were forced 
initially to define PDAs as nondeterministic machines. 

The reason that we used the phrase “roughly speaking” in the previous paragraph is that 
the operation of converting even a deterministic PDA into a machine that accepts the com¬ 
plementary language is not as simple as merely reversing the symbols ACCEPT and 
REJECT in the picture of the machine. For one thing, all crash possibilities must first be 
eliminated and turned into edges leading peacefully to REJECT. But even then reversing halt 
states might not create a machine in which all strings not previously accepted become ac¬ 
cepted. This is because there is the possibility that some input strings when fed into the orig¬ 
inal PDA were neither accepted nor rejected but looped forever. Reversing ACCEPT and 
REJECT will then leave a machine on which these inputs still loop forever. To prove the the¬ 
orem rigorously that the complement of a language accepted by a DPDA can also be ac¬ 
cepted by a DPDA, we would have to show how to eliminate the loop-forever possibilities 
and turn them into trips to REJECT. We could do this but it would be long. 

MIXING CONTEXT-FREE AND REGULAR LANGUAGES 

The union of a context-free language and a regular language must be context-free because 
the regular language is itself context-free and Theorem 36 (p. 376) applies. As to whether or 
not the union is also regular, the answer is that it sometimes is and sometimes is not. If one 
language contains the other, then the union is the larger of the two languages whether it be 
the regular or the nonregular context-free language. 

EXAMPLE 

PALINDROME is nonregular context-free and (a + b)* is regular and contains it. The union 
is regular. On the other hand, PALINDROME contains the regular language a* and so the 
union of these two is nonregular context-free. ■ 

We can provide a more interesting pair of examples where one language is not con¬ 
tained in the other. 

EXAMPLE 

The union of the nonregular context-free language { a n b n \ and the regular language b*a* is non¬ 
regular as seen by the Myhill-Nerode theorem because each string a"b belongs in a different 
class (for each there is a unique element of b* that completes a word in the union language). 

The complement of a* is regular and does not contain all of PALINDROME (because aaa 
is in PALINDROME, e.g.), nor does PALINDROME contain all of it (because ba is in the 
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complement of a*, e.g.). However, because PALINDROME does contain all of a*, the union'll 
of the complement of a* and PALINDROME is all strings, which is a regular language. | \ 

On the other hand, we have no guarantee that the intersection of a context-free language 
and a regular language is even context-free, although it might even turn out to be regular, J 
Certainly, if one is contained in the other, then the intersection will be the smaller language 3 
and have its property. But because we do not automatically know that the intersection of two ' J 
context-free languages is context-free, the following theorem provides us with some nonob- 3 
vious information. i 


THEOREM 41 

The intersection of a context-free language and a regular language is always context-free. 


PROOF 

We will prove this theorem by constructive algorithm. We start with a PDA for the context- | 
free language, called the PDAY, and an FA for the regular language, called the FAX, with 
states x,, x v x 3 , . . . and then we show how to construct a PDA for the intersection lan 
guage, called INT. This construction will closely parallel the constructions given in the proof j| 
of Kleene’s theorem that were later revealed to actually provide the basis of the intersection. 
machine for two FAs (see p. 174). 

Before we begin, let us assume that PDAY reads the entire input string before accepting jj 
the word. If it does not, then we use the algorithm of Theorem 29 (p. 311) to make it do s ri£8 

What we will do is label each of the states in PDAY with the name of the particular ' W\ 
x-state in FAX that the input string would be in if it were being processed on FAX at tti« 
same time. The START state of PDAY we label with the START state of FAX, (x,). If 
the START state on PDAY we go to a PUSH state, then, because we have not yet read stiff 
input letters, the FAX simulation leaves us still in x,. If we now go into a POP state in PDAY, 
we would still not have read any input letters and the string would remain in x, on FAXf^ 

Now if we do enter a READ state in PDAY, we still are in the FAX state we were formerly 
in, but as we leave the READ state by a- or 6-edges, it will correspond to entering (possibly) | 
new states in the FAX simulation. Remember that PDAY is a (possibly) nondeterministifj 
machine and so there may be several a-edges leaving the READ state, but we label each 
the states it takes us to with the x-state from FAX that an a-edge takes us to. 

We could find another complication. In FAX, an a-edge takes us to x 3 , whereas a 6-ed 
takes us to x 8 , but in PDAY both the a-edge and 6-edge take us to the same PDAY state. Tfiil 
PDAY state must then be cloned; that is, two copies of it must be produced with identical < 
sets of exiting edges but not entering edges. One of the clones will be the one the a-edge en~, 
ters, and it will get the label x 3 , whereas the other will be entered by the 6-edge and get the 
label x 8 . We continue to label the PDAY states with the corresponding FAX states. However^ 
as we revisit a PDA state that is already labeled, it may have to be recloned again if it does 
not have the appropriate corresponding FAX state label. For example, if a POP state was aL| 
ready labeled with x 2 because of one way in which it was entered, it may happen to also 
entered from a READ labeled x 9 by a 6-edge and, unfortunately, a 6-edge from x g on FA. 
takes us to x 9 again so we cannot happily enter this particular POP state. The answer is theif 
that the POP state we enter must be labeled x 9 and be a clone of the POP-x, state. 

To show that this algorithm is actually finite and does not create infinitely many new< 


states, what we can do simply is name all the states in PDAY as y v y 2 , y v . . . and simulta¬ 
neously create all possible combinations of y this and x that and connect them by the rules of 
both PDAY and FAX appropriately. That is, if we are in y p and x q , and it is a READ state in 
PDAY (or else we do not change our x-status), and we read a 6, then because PDAY says “if 
in y p and reading a 6, go to y r ” and FAX says “if in x q and reading a 6, go to x^,” we go to the 
new states y r and x 5 . This then, in a finite number of steps, almost completes the construction 
of our proposed intersection machine INT 

The construction is not yet complete because we did not explain that something special 
must happen to the ACCEPT states in order to be sure that the only words INT accepts are 
those accepted by both PDAY and FAX. If the processing of an input string terminates in an 
ACCEPT state that is labeled with an x m that is not a final state in FAX, then the input would 
not be accepted on both machines. We must change all ACCEPT states that are labeled with 
nonfinal x-states into REJECTS. Now if a string is run on INT and reaches an ACCEPT state, 
we know it will be accepted by both component machines and is truly in the intersection lan¬ 
guage. ■ 


EXAMPLE 

Let C be the language EQUAL of words with the same total number of a’s and 6’s. Let the 
PDA to accept this language be 



This is a new machine to us, so we should take a moment to dissect it. At every point in 
the processing of an input string, the STACK will contain whichever letter has been read 
more, a or 6, and will contain as many of that letter as the number of extra times it has been 
read. If we have read from the TAPE six more 6’s than a’s, then we shall find six 6’s in the 
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STACK. If the STACK is empty at any time, it means an equal number of a’s and b’ s have 
been read. §| 

The process begins in START and then goes to READ,. Whatever we read in READ, is our 
first excess letter and is pushed onto the STACK. The rest of the input string is read in READ 
If during the processing we read an a , we go and consult the STACK. If the STACK 
contains excess //s, then one of them will be cancelled against the a we just read 
POP,-READ 2 . If the STACK is empty, then the a just read is pushed onto the STACK as a 
new excess letter itself. If the STACK is found to contain a’s already, then we must replace 
the one we popped out for testing as well as add the new one just read to the amount of total 
excess a’s in the STACK. In all, two a's must be pushed onto the STACK. 

When we are finally out of input letters in READ 2 , we go to POP 3 to be sure there are 
no excess letters being stored in the STACK. Then we accept. 

This machine reads the entire INPUT TAPE before accepting and never loops forever. 
Let us intersect this with the FA below that accepts all words ending in the letter a\ j 



b 


Now let us manufacture the joint intersection machine. We cannot move out of x, until 
after the first READ in the PDA. 



At this point in the PDA, we branch to separate PUSH states, each of which takes us to 
READ 2 . However, depending on what is read in READ,, we will either want to be in READ 2 
and jc, , or READ 2 and x 2 , so these must be two different states: 



From READ, and x 2 if we read an a, we shall have to be in POP, and x 2 , whereas if we 
read a b, we shall be in POP 2 and x,. In this particular machine, there is no need for POPj 
and jc, because POP, can only be entered by reading an a and a, can only be entered by read-: 
ing a b. For analogous reasons, we do not need a state called POP 2 and x 2 either. 3 
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We shall theoretically need both POP 3 and x, and POP 3 and x 2 because we have to keep 
track of the last input letter. But even if POP 3 and x, should happen to pop a A, it cannot ac¬ 
cept the input because x, is not a final state and so the word ending there is rejected by the 
FA. Therefore, we do not even bother drawing POP 3 and x,. If a blank is read in READ 2 , x,, 
the machine peacefully crashes. 


The whole machine looks like this: 



EXAMPLE 

Let us reconsider the language DOUBLEWORD, which was shown in the previous chapter 
to be non-context-free. We can provide another proof of this fact by employing our last theo¬ 
rem. Let us assume for a moment that DOUBLEWORD were a CFL. Then when we inter¬ 
sect it with any regular language, we must get a context-free language. 

Let us intersect DOUBLEWORD with the regular language defined by 

aa*bb*aa*bb* 

A word in the intersection must have both forms; this means it must be 
ww where w — a n b m for some n and m — 1 2 3 . . . 

This observation may be obvious, but we shall prove it anyway. If w contained the sub¬ 
string ba, then ww would have two of them, but all words in aa*bb*aa*bb* have exactly one 
such substring. Therefore, the substring ba must be the crack in between the two w’s in the 
form ww. This means w begins with a and ends with b. Because it has no ba, it must be a n bP. 

The intersection language is therefore 


[a^a^] 
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But we showed in the last chapter that this language was non-context-free. Thereft 
DOUBLEWORD cannot be context-free either. 

4 PROBLEMS 

1. Find CFGs for these languages: 

(i) All words that start with an a or are of the form a n b n . 

(ii) All words that have an equal number of a 's and b's or are of the form a n b n . 

(iii) All words in EVEN-EVEN*. 

(iv) All words of the form 

a n b n a m b m , where n, m — 1 2 3 . . . , but m need not = n 

= [abab aabbab abaabb aaaabbbbab aaabbbaaabbb . . .} 

2. Find CFGs for these languages: 

(i) All words of the form 

a x b v a\ where jcj,z= 12 3... and x + z = y 
= {abba aabbba abbbaa aabbbbaa . . .} 

Hint: Concatenate a word of the form a n b n with a word of the form b m a m . 

(ii) All words of the form 

cfWa 2 , where x,y,z = 1 2 3 . . . and v = 2x 4- 2z 

= {abbbba abbbbbbaa aabhbbbba . . .} 

(iii) All words of the form 

a x b y a\ where x,y,z = 1 2 3 . . . and y = 2x + 2z 

= \abbba abbbbaa aabbbbba . . .} 

(iv) All words of the form 

a x b y a z b w y where a\ y, z, w = 1 2 3 . . . 

and y> x and z>w and 
x + z — y + w 

Hint: Think of these words as 

(artfXbWXa'b’) 

(v) What happens if we throw away the restrictions y > x and z > w? 

3. (i) Find a CFG for the language of all words of the form 

a n b n or b n a n , where n = 1 2 3 . . . 

(ii) Is the Kleene closure of the language in part (i) the language of all words with 
equal number of a's and b's that we have called EQUAL? 

(iii) Using the algorithm from Theorem 38 (p. 384), find the CFG that generates i 
closure of the language in part (i). 

(iv) Compare this to the CFG for the language EQUAL given before (p. 239). 

(v) Write out all the words in 

(language of part (i))* 
that have eight or fewer letters. 
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4. Use the results of Theorems 36, 37, and 38 and a little ingenuity and the recursive defin¬ 
ition of regular languages to provide a new proof that all regular languages are context- 
free. 

5. (i) Find a CFG for the language 

L l = a(bb)* 

(ii) Find a CFG for the language L*. 

(iii) Find a CFG for the language L 2 = (bb)*a. 

(iv) Find a CFG for L 2 *. 

(v) Find a CFG for 

L 3 = bba*bb + bb 

(vi) Find a CFG for L 3 *. 

(vii) Find a CFG for 

l* + l 2 * + l* 

(viii) Compare the CFG in part (vii) to 

S^aS | bbS | A 
Show that they generate the same language. 

6. A substitution is the action of taking a language L and two strings of terminals called s a 
and s b and changing every word of L by substituting the string s a for each a and the 
string s b for each b in the word. This turns L into a completely new language. Let us say, 
for example, that L was the language defined by the regular expression 

a*(bab* + aa)* 

and say that 

s = bb , s. = a 

Then L would become the language defined by the regular expression 

(bb)*(abba* + bbbb)* 

(i) Prove that after any substitution any regular language is still regular. 

(ii) Prove that after any substitution a CFL is still context-free. 

7. Find PDAs that accept 

(i) { a n b m , where n,m=l 2 3 . . . and n ^ m} 

(ii) { a*b v a z , where x, y, z — 1 2 3 . . . and x + z — y } 

(iii) L,, L 2 where 

L, = all words with a double a 
L 2 — all words that end in a 

8. (i) Some may think that the machine argument that tried to prove Theorem 37 (p. 381) 

could be made into a real proof by using the algorithms of Theorem 29 (p. 311) to 
convert the first machine into one that empties its STACK and TAPE before accept¬ 
ing. If while emptying the TAPE, a nondeterministic leap is made to the START 
state of the second machine, it appears that we can accept exactly the language 
LjL 2 . Demonstrate the folly of this belief. 
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(ii) Show that Theorem 37 can have a machine proof if the machines are those d 
oped in Theorem 30 (p. 318). 

(iii) Provide a machine proof for Theorem 38 (p. 384). 

9. Which of the following are context-free? 

(i) (a)(a + b)* D ODDPALINDROME 

(ii) EQUAL Pi {a n b n a n \ 

(iii) {a n b n j n PALINDROME' 

(iv) EVEN-EVEN' D PALINDROME 

(v) { a n b n }' fl PALINDROME 

(vi) PALINDROME fl { a n b n+m a m , where n, m = 1 2 3 . . . , n = m or n * 

(vii) PALINDROME' fl EQUAL 

10. For the example on p. 389, 

(i) Build a PDA for M as defined earlier. 

(ii) Show that [a n b + a n ] is a CFL. 

(iii) Build a PDA for M as defined earlier. 

(iv) Build a PDA for M rq as defined earlier. 

(v) Build a PDA for M as defined earlier. 

11. (i) Show that 

L, = [a p b q a r b p , where p , q , r are arbitrary whole numbers} 


is context-free, 

(ii) Show that 


is context-free, 

(iii) Show that 


is context-free, 

(iv) Show that 


is non-context-free. 


L 2 = \a p b q a p b s ) 


L 3 = { a p b p a r b s 


L { n L 2 n l 3 


gg|M 

|§;;-■ 
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12. Recall the language VERYEQUAL over the alphabet X = [a b c}: 

VERYEQUAL = {all strings of a' s, b' s, and c’s that have the 
same total number of a 's as b 's as c’s} 

Prove that VERYEQUAL is non-context-free by using a theorem in this chapter. (Con 
pare with Chapter 20, Problem 19.) 

13. (i) Prove that the complement of the language L 

L = { a n b m , where n # m) 

is context-free, but that neither L nor L' is regular. 

(ii) Show that 

L. = { a n b m , where n>m) 


!;• : . 








and 
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L 2 = { a n b m , where m>n) 

are both context-free and not regular. 

(iii) Show that their intersection is context-free and nonregular. 

(iv) Show that their union is regular. 

14. (i) Prove that the language 

L, = ia n b m (f +m } 

is context-free. 

(ii) Prove that the language 

L 2 = { a n b n a m , where either n = m or n ± m) 
is context-free. 

(iii) Is their intersection context-free? 

15. In this chapter, we proved that the complement of { a n b n a n } is context-free. Prove this 
again by exhibiting one CFG that generates it. 

16. Let L be a CFL. Let R be a regular language contained in L. Let L — R represent the lan¬ 
guage of all words of L that are not words of R. Prove that L — R is a CFL. 

17. The algorithm given in the proof of Theorem 41 (p. 394) looks mighty inviting. We are 
tempted to use the same technique to build the intersection machine of two PDAs. How¬ 
ever, we know that the intersection of two CFLs is not always a CFL. Explain why the 
algorithm fails when it attempts to intersect two PDAs. 

18. (i) Take a PDA for PALINDROMEX and intersect it with an FA for a*Xa*. (This 

means actually build the intersection machine.) 

(ii) Analyze the resultant machine and show that the language it accepts is { a n Xa n }. 

19. (i) Intersect a PDA for { a n b n } with an FA for a(a + b)*. What language is accepted by 

the resultant machine? 

(ii) Intersect a PDA for { a n b n } with an FA for b(a + b)*. What language is accepted by 
the resultant machine? 

(iii) Intersect a PDA for { a n b n } with an FA for (a + b)*aa(a + b)*. 

(iv) Intersect a PDA for [a n b n ) with an FA for EVEN-EVEN. 

20. Intersect a PDA for PALINDROME with an FA that accepts the language of all words 
of odd length. Show, by examining the machine, that it accepts exactly the language 
ODDPALINDROME. 
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Decidability 


# EMPTINESS AND USELESSNESS 

In Part II, we have been laying the foundation of the theory of formal languages. Among the 
many avenues of investigation we have left open are some questions that seem very natural 
to ask, such as the following: 

1. How can we tell whether or not two different CFGs define the same language? 

2. Given a particular CFG, how can we tell whether or not it is ambiguous? 

3. Given a CFG that is ambiguous, how can we tell whether or not there is a different CFG 
that generates the same language but is not ambiguous? 

4. How can we tell whether or not the complement of a given context-free language is also 
context-free? 

5. How can we tell whether or not the intersection of two context-free languages is also 
context-free? 

6. Given two context-free grammars, how can we tell whether or not they have a word in 
common? 

7. Given a CFG, how can we tell whether or not there are any words that it does not gener¬ 
ate? (Is its language (a + b)* or not?) 

These are very fine questions, yet, alas, they are all unanswerable. There are no algo¬ 
rithms to resolve any of these questions. This is not because computer theorists have been 
too lazy to find them. No algorithms have been found because no such algorithms exist— 
anywhere—ever. 

We are using the word “exist” in a special philosophical sense. Things that have not yet 
been discovered but that can some day be discovered we still call existent, as in the sentence, 
“The planet Jupiter existed long before it was discovered by man.” On the other hand, cer¬ 
tain concepts lead to mathematical contradictions, so they cannot ever be encountered, as in, 
“The planet on which 2 + 2 = 5,” “The smallest planet on which 2 + 2 = 5,” or “The tallest 
married bachelor.” In Part III, we shall show how to prove that some computer algorithms 
are just like married bachelors in that their very existence would lead to unacceptable contra¬ 
dictions. Suppose we have a question that requires a decision procedure. If we prove that no 
algorithm can exist to answer it, we say that the question is undecidable. Questions 1 
through 7 are undecidable. 
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This is not a totally new concept to us; we have seen it before, but not with this termi¬ 
nology. In geometry, we have learned how to bisect an angle given a straightedge and com¬ 
pass. We cannot do this with a straightedge alone. No algorithm exists to bisect an angle us¬ 
ing just a straightedge. We have also been told (although the actual proof is quite advanced) 
that even with a straightedge and compass we cannot trisect an angle. Not only is it true that 
no one has ever found a method for trisecting an angle, nobody ever will. And that is a theo¬ 
rem that has been proven. 

We shall not present the proof that questions 1 through 7 are undecidable, but toward 
the end of the book we will prove something very similar. 

What Exists What Does Not Exist 

1. What is known 1. Married bachelors 

2. What will be known 2. Algorithms for questions 1 through 

7 above 

3. What might have been 3. A good 50 cigar 

known but nobody will 

ever care enough to 
figure it out 

There are, however, some other fundamental questions about CFGs that we can answer: 

1. Given a CFG, can we tell whether or not it generates any words at all? This is the ques¬ 
tion of emptiness. 

2. Given a CFG, can we tell whether or not the language it generates is finite or infinite? 
This is the question of finiteness. 

3. Given a CFG and a particular string of letters w, can we tell whether or not w can be 
generated by the CFG? This is the question of membership. 

Now we have a completely different story. The answer to each of these three easier 
questions is “yes.” Not only do algorithms to make these three decisions exist, but they are 
right here on these very pages. 

THEOREM 42 

Given any CFG, there is an algorithm to determine whether or not it can generate any words 
at all. 

PROOF 

The proof will be by constructive example. We show there exists such an algorithm by pre¬ 
senting one. 

In Theorem 23 of Chapter 13, we showed that every CFG that does not generate A can 
be written without A-productions. 

In that proof, we showed how to decide which nonterminals are nullable. The word A is 
a word generated by the CFG if and only if S is nullable. We already know how to decide 
whether the start symbol S is nullable: 
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Therefore, the problem of determining whether A is a word in the language of any C 
has already been solved. 

Let us assume now that A is not a word generated by the CFG. In that case, we can c 
vert the CFG to CNF, preserving the entire language. 

If there is a production of the form 


where f is a terminal, then / is a word in the language. 

If there are no such productions, we then propose the following algorithm: 

Step 1 For each nonterminal N that has some productions of the form 


t 

US 


I 

mm 

BM 

i' 


where t is a terminal or string of terminals, we choose one of these producti 
and throw out all other productions for which N is on the left side. We then 
place N by t in all the productions in which N is on the right side, thus elimi 
ing the nonterminal N altogether. We may have changed the grammar so that i 
no longer accepts the same language. It may no longer be in CNF. That is fi 
with us. Every word that can be generated from the new grammar could ha 
been generated by the old CFG. If the old CFG generated any words, then 
new one does also. 


AKIs:’’.' 


3 ' 

m 


Step 2 Repeat step 1 until either it eliminates S or it eliminates no new nontermin 
If S has been eliminated, then the CFG produces some words; if not, then 
does not. (This we need to prove.) 


II 

Bi 


BlfBfp®?; 
SV-. 


S" ||Bil 

8 


The algorithm is clearly finite, because it cannot run step 1 more times than there 
nonterminals in the original CNF version. The string of nonterminals that will eventually 
place S is a word that could have been derived from S if we retraced in reverse the exact 
quence of steps that lead from the terminals to S. 

If step 2 makes us stop while we still have not replaced S, then we can show that 
words are generated by this CFG. If there were any words in the language, we could ret 
the tree from any word and follow the path back to S. 

For example, if we have the derivation tree 


:: : ; 

: ., 




m 

I 


m 

m 

m i 


then we can trace backward as follows (the relevant productions can be read from the tr 

B—>b 

must be a production, so replace all fi’s with h’s. 

Y—*BB 


11 

mm 


|8B§ 

mm 
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is a production, so replace Y with bb. 

A-*a 

is a production, so replace A with a. 

X—>AY 

is a production, so replace X with abb. 

S-+XY 

is a production, so replace S with abbbb. 

Even if the grammar included some other production, such as, 

B—*d (where d is some other terminal) 

we could still retrace the derivation from abbbb to S, but we could just as well end up replac¬ 
ing S by adddd —if we chose to begin the backup by replacing all ZTs by d instead of b. 

The important fact is that some sequence of backward replacements will reach back to S 
if there is any word in the language. 

The proposed algorithm is therefore a decision procedure. ■ 

EXAMPLE 

Consider this CFG: 

S-+XY 

X^AX 

X—+AA 

A^>a 

Y—+BY 

Y-+BB 

B^b 

Step 1 Replace all A’s by a and all B' s by b. This gives 

S^>XY 

X-»aX 

X^aa 

Y—*bY 

Y^bb 

Step 1 Replace all X’s by aa and all Y' s by bb 

S —>aabb 

Step 1 Replace all S’s by aabb. 

Step 2 Terminate step 1 and discover that S has been eliminated. Therefore, the CFG 
produces at least one word. ■ 
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EXAMPLE 


Consider this CFG: 


Step 1 Replace all A’s by a and all B 's by b. This gives 


S^XY 

X-+aX 

Y^>bY 

Y-*bb 


Step 1 Replace all V s by bb. This gives 


Step 2 Terminate step 1 and discover that S is still there. This CFG generates no 
words. ■ 

As a final word on this topic, we should note that this algorithm does not depend on the 
CFGs being in CNF, as we shall see in the problems at the end of this chapter. 

We have not yet gotten all the mileage out of the algorithm in the previous theorem. We 
can use it again to prove the following. 


THEOREM 43 

There is an algorithm to decide whether or not a given nonterminal X in a given CFG is ever 
used in the generation of words. j 


PROOF 

The first thing we want to decide is whether from X we can possibly derive a string of all ter¬ 
minals. Then we need also to decide whether, starting from 5, we can derive a working string 
involving X that will lead to a word. 

To see whether we can produce a string of all terminals from the nonterminal X, we can 
make use of the previous theorem and a clever trick. 


Just for a moment, reverse S and X in all the production rules in the grammar. Now use the 
algorithm of the previous theorem to see whether this grammar produces any words from its 
start symbol. If it does, then X in the nontampered original grammar can produce a string of 
all terminals. 

Let us call a nonterminal that cannot ever produce a string of terminals unproductive. 
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The algorithm that will answer whether X is ever used in the production of words from S 
will require blue paint. 

Step 1 Find all unproductive nonterminals. 

Step 2 Purify the grammar by eliminating all productions involving the unproductive 
nonterminals. 

Step 3 Paint all X’s blue. 

Step 4 If any nonterminal is the left side of a production with anything blue on the 
right, paint it blue, and paint all occurrences of it throughout the grammar blue, 
too. 

Step 5 The key to this approach is that all the remaining productions are guaranteed to 
terminate. This means that any blue on the right gives us blue on the left (not 
just all blue on the right. Repeat step 4 until nothing new is painted blue. 

Step 6 If S is blue, X is a useful member of the CFG, because there are words with der¬ 
ivations that involve X-productions. If not, X is not useful. 

Obviously, this algorithm is finite, because the only repeated part is step 4 and that can 
be repeated only as many times as there are nonterminals in the grammar. 

It is also clear that if X is used in the production of some word, then S will be painted 
blue, because if we have 

S=> ■ ■ ■ =» (blah) X (blah) => • • • =»word 

then the nonterminal that put X into the derivation in the first place will be blue, and the non¬ 
terminal that put that one in will be blue, and the nonterminal from which that came will be 
blue . . . up to S'. 

Now let us say that S is blue. Let us say that it caught the blue through this sequence: X 
made A blue, A made B blue, and B made C blue ... up to S. The production in which X 
made A blue looked like this: 

A —*■ (blah)X(blah) 

Now the two (blah)’s might not be strings of terminals, but it must be true that any non¬ 
terminals in the (blah)’s can be turned into strings of terminals because they survived step 2. 
So, we know that there is a derivation from A to a string made up of X with terminals 

A (string of terminals)X (string of terminals) 

We also know that there is a production of the form 

B => (blah) A (blah) 

that can likewise be turned into 

B => (string of terminals) A (string of terminals) 

==> (string of terminals)X (string of terminals) 

We now back all the way up to S and realize that there is a derivation 

S => (string of terminals) X (string of terminals) 

(word) 

Therefore, this algorithm is exactly the decision procedure we need to decide whether X 
is actually ever used in the production of a word in this CFG. ■ 
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A nonterminal that cannot be used in a production of a word is called useless. Theorem; 
43 says that uselessness is decidable. 


EXAMPLE 


Consider the CFG 


>ABa \bAZ\b 
>Xb | bZa 
>hAA 

>aZa | aaa 
>ZAbA 


We quickly see that X terminates (goes to all terminals, whether or not it can be reached 
from S ). Z is useless (because it appears in all of its own productions). A is blue. B is blue. S i$ ; HpS- 
blue. So, X must be involved in the production of words. To see one such word, we can write '•< ^ 


Now because A is useful, it must produce some string of terminals. In fact, 

A =^> aaab 


B =*> bAaaab 
=> bXbaaab 


S=*ABa 

aaabBa 

=> aaabbXbaaaba 

We know that X is productive, so this is a working string in the derivation of an actual 
in the language of this grammar. 


FINITENESS 

The last two theorems have been part of a project, designed by Bar-Hillel, Perles, 
Shamir, to settle a more important question. 


THEOREM 44 

There is an algorithm to decide whether a given CFG generates an infinite language or 
nite language. 


PROOF 


The proof will be by constructive algorithm. We shall show that there exists such a procedure ; J| 
by presenting one. If any word in the language is long enough to apply the pumping lemrnai j 
(Theorem 34, p. 360) to, we can produce an infinite sequence of new words in the language. . ;|p 


Finiteness 


If the language is infinite, then there must be some words long enough so that the pump¬ 
ing lemma applies to them. Therefore, the language of a CFG is infinite if and only if the 
pumping lemma can be applied. 

The essence of the pumping lemma was to find a self-embedded nonterminal X. We 
shall show in a moment how to tell whether a particular nonterminal is self-embedded, but 
first we should also note that the pumping lemma will work only if the nonterminal that we 
pump is involved in the derivation of any words in the language. Without the algorithm of 
Theorem 43, we could be building larger and larger trees, none of which are truly derivation 
trees. For example, in the CFG 

S-»aX | b 
X^XXb 


the nonterminal X is certainly self-embedded, but the language is finite nonetheless. 
So, the algorithm is as follows: 


EXAMPLE 


S^ABa \bAZ\b 
A^Xb | bZA 
B^bAA 

X —>aZa | bA \ aaa 
Z —» ZAbA 


Consider the grammar 


Step 1 Use the algorithm of Theorem 43 to determine which nonterminals are useless. 
Eliminate all productions involving them. 

Step 2 Use the following algorithm to test each of the remaining nonterminals, in turn, 
to see whether they are self-embedded. When a self-embedded one is discov¬ 
ered, stop. 

To test X: 

(i) Change all X’s on the left side of productions into the Russian letter >K, 
but leave all X’s on the right side of productions alone. 

(ii) Paint all X’s blue. 

(iii) If Y is any nonterminal that is the left side of any production with some 
blue on the right side, then paint all F’s blue. 

(iv) Repeat step 2(iii) until nothing new is painted blue. 

(v) If 7K is blue, then X is self-embedded; if not, it is not 

Step 3 If any nonterminal left in the grammar after step 1 is self-embedded, the lan¬ 
guage generated is infinite. If not, then the language is finite. 


The explanation of why this procedure is finite and works is identical to the explanation 
in the proof of Theorem 43. ■ 


This is the grammar of the previous example with the additional production X —* bA. As be¬ 
fore, Z is useless, while all other nonterminals are used in the production of words. We now 
test to see whether X is self-embedded. 

First, we trim away Z: 

S-^ABa | b 
A —»Xfr 
B^bAA 
X — > bA | aaa 
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Now we introduce >K: 


Now the paint: 


S-^ABa | b 
A—*Xb 
B—*bAA 
j>K—*bA I aaa 


X is blue 

A -+Xb, so A is blue 
}K~* bA, so >K is blue 
B—* A, so Bis blue 
S~+ABa , so S is blue 


Conclusion: }K is blue, so the language generated by this CFG is infinite. 


MEMBERSHIP—THE CYK ALGORITHM 

We now turn our attention to the last decision problem we can handle for CFGs. 


THEOREM 45 

Given a CFG and a string x in the same alphabet, we can decide whether or not x can be gen 
erated by the CFG. 


PROOF 

Our proof will be by constructive algorithm. Given a CFG in CNF and a particular string of 
letters, we will present an algorithm that decides whether or not the string is derivable from 
this grammar. This algorithm is called the CYK algorithm because it was invented by John 
Cocke and subsequently also published by Tadao Kasami (1965) and Daniel H. Younger 
(1967). 

First, let us make a list of all the nonterminals in the grammar S , N v N 2 , N v . 

And let the string we are examining for membership in the language be denoted by 

* = * 2 * 3 • • • X n 

In general, it may be that the letters are not all different, but what we are interested in 
here is the position of every possible substring of a*. We shall be answering the question o 
which substrings of a are producible (by extended derivation) from which nonterminals. Fo 
example, if we already know that the substring a ' 3 . . . x 7 can be derived from the nontermi 
nal jV 8 , the substring a 8 . . . a, , can be derived from the nonterminal N v and we happen ttf 
have the CNF production N 4 —*NJV 2 , then we can conclude that the total substrin 
a 3 . . . a u can be derived from the nonterminal N 4 . Symbolically, from 

N 8 =^a 3 . . . a 7 and A 2 =^a 8 . . . x u and A 4 -*AJV 2 


we can conclude that 


N,=>x 3 ... a. 
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We wish to determine, in an organized fashion, a comprehensive list of which substrings 
of x are derivable from which nonterminals. If we had such a reliable list, we would know 
whether the nonterminal S produces the complete string a, which is what we want to know. 

We start off our list with all the substrings of length 1 (the single letters) of a and for 
each we determine which nonterminals can produce them. This is easily done because all 
such derivations come immediately from the CNF productions nonterminal terminal: 


Substring 

All Producing Nonterminals 


^this’ ^that ’ • ’ 

X 2 

^such’ ^ • • • 

X n 

N 

something ‘ * * 


Now we look at all substrings of length 2, such as a 6 a 7 . This can only be produced from 
a nonterminal N p if the first half can be produced by some nonterminal N q and the second 
half by some nonterminal N r , and there is a rule of production in the grammar that says 
N p —>NJX r . We can systematically check all the rules of production and our list above to de¬ 
termine whether the length-2 substrings can be produced: 

Substring All Producing Nonterminals 

a,a 2 N. . . 

* 2*3 N - ■ ■ 

X n-\ X n N. . . 

It may be the case that some of these substrings cannot be derived from any nontermi¬ 

nal, but it also may be the case that some can be derived in several ways. 

We now move on to substrings of length 3—for example, a 5 a 6 a 7 . This substring can also 
be derived from a production of the form N —* NN, where the first N produces the first half of 
the substring and the second N produces the second half of the substring, but now we have 
two different ways of breaking the substring into its two halves. The first half could be a 5 a 6 
and the second half could be a 7 , or the first half could be a 5 and the second half could be 
a 6 a ? . All nonterminals producing any of these four halves are already on our list, so a simple 
check of all the productions in the CFG will determine all the ways (if any) of producing this 
(and any other length-3) substring. Our list then grows: 

Substring All Producing Nonterminals 

AjAjAj N. . . 

* 2 * 3*4 N ‘ * • 

X n-#n-\ X n N ‘‘‘ 

Our list keeps growing. Next, we examine all substrings of length 4. They can be broken 
into halves in three different ways: the first three letters and the last letter, the first two letters 
and the last two letters, the first letter and the last three letters. For all these possibilities, we 
check the list to see what nonterminals produce these halves and whether the two nontermi¬ 
nals can be merged into one by a rule of production: N —* NN. 
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Substring All Producing Nonterminals 


*,*2*3X4 

¥ 3 % 


N . . . 
N . . . 


We continue this same process with substrings of length 5 (made into halves in four 
ways each), length 6 , and so on. The whole process terminates when we have all of * as the B 
length of the substring: j|| 


Substring 


X.X, . . . X„ 


All Producing Nonterminals 

N. . . 


We now examine the set of producing nonterminals, and if 5 is among them, then * - 
can be produced, and if 5 is not among them, then * simply cannot be produced by thi 
CFG. 

This algorithm is finite and decisive. 


EXAMPLE 

Let us consider the CFG 

S^XY 

X^XA \ a\ b 
Y^AY | a 
A^>a 

and let us ask whether the string x = babaa is a word in this language. 

We begin our list with all the ways of producing the one-letter substrings of x: 


Substring 

All Producing Nonterminals 

x, = b 

X 

x 2 = a 

X,Y,A 

x 3 — b 

X 

*4 = a 

X,Y, A 

x 5 — a 

X,Y, A 


Now we look at the two-letter substrings. The substring x,x 2 — ba can only come from . 
any production whose right side is XX, XY, or XA. Two of these are the right side of a pro-: j 
duction, and so x x x 2 can be produced by 5 or X. The substring x^c 3 can only come from any fi# 
production whose right side is XX, YX, or AX. None of these is the right side of a produc¬ 
tion, and so this substring cannot be produced. The substring x 3 * 4 can only come from pro¬ 
ductions whose right side is XX, XY, or XA, and so this substring can be produced by 5 or;, 

X. The substring x 4 x 5 can only come from productions whose right side is XX, XT, XA, TXqU 
TT, TA, AX, AT, or AA. Therefore, this substring can come from 5, X, or T. Our list now im 
eludes the following: j 
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Substring All Producing Nonterminals 


Now let us consider the substrings of length 3. The first is x x x^c y If we break this into the 
first half x, and the second half x^c v we can see from the list that the second half cannot be 
produced at all. So, the correct way to break this is into x,x 2 and x 3 . As we see from the table, 
the first half can be produced from 5 or X and the second half can be produced only from X. 
This means that in order to form this substring, we would need a production whose right side is 
SX or XX. There are no such productions and so this substring cannot be generated. 

Let us consider generating the substring * 2 * 3 X 4 . We know it is unprofitable to consider the 
first half to be x^c 3 so we break it into x 2 and x 3 x 4 . The list says that we can produce this combi¬ 
nation from any production whose right side is XS, XX, YS, YX, AS, or AX. Unfortunately, none 
of these are right sides of any productions, so this substring cannot be produced either. 

The last three-letter substring to consider is x 3 x 4 x 5 . It can be factored into x 3 times x^, or 
x 3 x 4 times x 5 . The first of these give XS, XX, or XT; the second gives SX, SY, SA, XX, XT, or XA. 
Only XT and XA are on the right sides of a production and their left nonterminals are X and 5. 

Our list now includes the following: 

Substring All Producing Nonterminals 


This may look fairly bleak, but it is conceivable that the string x still may be formed by 
multiplying *,x 2 with the bottom row, so let us persevere. 

The first four-letter substring is x x x^x 3 x 4 . From the list above, it is clear that the only 
hope of producing this substring is from the factoring x,x 2 times x 3 x 4 . The list tells us that 
this can come from a production whose right side is 55, SX, XS, or XX. None of these are the 
right sides of productions, so this substring is unproducible. 

The other four-letter substring is x^c 3 x 4 x y The only hope here is to factor this as x 2 times 
* 3 X 4 X 5 because x 2 x 3 and * 2 * 3 X 4 are both unproducible. This factorization gives us the possibil¬ 
ities XS, YS, AS, XX, YX, or AX. None of these are the right side of a production. 

The list now includes the following: 

Substring All Producing Nonterminals 

*lW4 — 

¥3% — 


We finally come to the string x itself. We can see that it does not pay to factor it into a 1 
times a 4, so the only other factorization possible is a 2 times a 3. Remember, because the 
grammar is in CNF, all factorizations must contain exactly two factors. Our last resort is 
therefore x x x 2 times x 3 x 4 x 5 . Each factor can be produced only by 5 or X, but no productions 
have the right side 55, XS, SX, or XX. Therefore, this word is unproducible from this 
grammar: 
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Substring All Producing Nonterminals 

*lW4*5 ” ■£ 

We should note that for the grammar above, and for any other grammar without unit or 
A-productions, it is also possible to decide whether a proposed string is in the language gen¬ 
erated by that grammar by drawing enough levels of the total language tree. If we draw the 
total language tree for the grammar above far enough to produce all five-letter words, we can 
then search the tree to see that babaa is not among them. This too could be developed into 
an effective decision procedure. 


EXAMPLE 


Let us consider the following CFG in CNF: 


A-+AA 

A—*a 

Clearly, all the words in this grammar are of the form a*, but are all the words in a* in 
the language of this grammar? We can see immediately that A and a are not, but aa is. Let 
us use the C YK algorithm to test to see whether x = aaa is. 

The list starts off easily enough: 

Substring All Producing Nonterminals 


We can see now that both substrings of length 2 are the same, aa, and are factorable into 
exactly AA. This is the right side of two productions whose left sides are S and A. Therefore, 
the list continues: 

Substring All Producing Nonterminals 


There is only one length-3 substring, x itself, and it can be factored into x, times x 2 x v . 
or x { x 2 times x v The first case gives the nonterminal possibilities AS or AA, and the second 
gives the possibilities SA or AA. Of these, only AA is the right side of a production (of two 
productions, actually). The left sides are S and A. Therefore, the list concludes with the 
following: 

Substring All Producing Nonterminals 
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From this list, we see that the word x can indeed be derived from the start symbol 5 and 
so it is in the language. It should also be clear that similarly any string of more than three a's 
can also be produced by this CFG from the nonterminals S and A. ■ 

PARSING SIMPLE ARITHMETIC 

The CYK algorithm of the previous section answered the question of whether a word was 
derivable from a certain grammar not how it was derived. This is also decidable, as we see in 
this section. 

The grammars we presented earlier for AE (arithmetic expressions) were ambiguous. 
This is not acceptable for programming because we want the computer to know and execute 
exactly what we intend. 

Two possible solutions were mentioned earlier: 

1. Require the programmer to insert parentheses to avoid ambiguity. For example, instead 
of the ambiguous 3 + 4*5, insist on 

(3 + 4) * 5 or 3+ (4*5) 

2. Find a new grammar for the same language that is unambiguous because the interpreta¬ 
tion of “operator hierarchy” (i.e,, * before +) is built into the system. 

Programmers find the first solution too cumbersome and unnatural. Fortunately, there 
are grammars (CFGs) that satisfy the second requirement. 

We present one such grammar for the operations + and * alone, called PLUS-TIMES. 
The rules of production are 

T 
F 

Loosely speaking, E stands for an expression, T fo r a term in a sum, F for a factor in a 
prod uct, and i for any identifier by which we mean any number or storage location name 
(variable). The terminals clearly are 

+ *()*' 

because these symbols occur on the right side of productions, but never on the left side. 

To generate the word i + / * i by leftmost derivation, we must proceed as follows: 

S=*E 
=>r + E 
=>£ + £ 

=> / + £ 

=*/ + T 
=* / + F*T 
=> / + i*T 
=>/ + /*£ 

=> i + i * i 


S—+E 
E—*T + £ 
T—*F*T 
F —> (E) I i 
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The syntax tree for this is 


sw 


It is clear from this tree that the word represents the addition of an identifier with the 
product of two identifiers. In other words, the multiplication will be performed before the 
addition, just as we intended it to be in accordance with conventional operator hierarchy. 
Once the computer can discover a derivation for the formula, it can generate a machine- 
language program to accomplish the same task. 


DEFINITION 3| 

Given a word generated by a particular grammar, the task of finding its derivation is called - 

parsing. ■ 

' : 

Until now we have been interested only in whether a string of symbols was a word in a cer- CSJ 
tain language. We were worried only about the possibility of generation by grammar or accep- ; 
tance by machine. Now we find that we want to know more. We want to know not just whether :i| 
a string can be generated by a CFG but also how. We contend that if we know the (or one of the) 
derivation tree(s) of a given word in a particular language, then we know something about the Hp 
meaning of the word. This section is different from the other sections in this book because here .• j 
we are seeking to understand what a word says by determining how it can be generated. 

There are many different approaches to the problem of CFG parsing. We shall consider 
three of them. The first two are general algorithms based on our study of derivation trees for , 1 
CFGs. The third is specific to arithmetic expressions and makes use of the correspondence ' H 
between CFGs and PDAs. 

The first algorithm is called top-down parsing. We begin with a CFG and a target 
word. Starting with the symbol 5, we try to find some sequence of productions that gener- "q 
ates the target word. We do this by checking all possibilities for leftmost derivations. To or- t 
ganize this search, we build a tree of all possibilities, which is like the total language tree of 
Chapter 12. We grow each branch until it becomes clear that the branch can no longer pre¬ 
sent a viable possibility; that is, we discontinue growing a branch of the whole language tree - .q 
as soon as it becomes clear that the target word will never appear on that branch, even gener- 4§3 
ations later. This could happen, for example, if the branch includes in its working string a |"J 
terminal that does not appear anywhere in the target word or does not appear in the target | 
word in a corresponding position. It is time to see an illustration. 

Let us consider the target word 

/+ i*i 

in the language generated by the grammar PLUS-TIMES. 

We begin with the start symbol 5. At this point, there is only one production we can pos- ||j 
sibly apply, S—*E. From E, there are two possible productions: ,I 
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E-^T + E, E—*T 

In each case, the leftmost nonterminal is T and there are two productions possible for replac¬ 
ing this T. 

The top-down leftmost parsing tree begins as shown below: 


F*T+E F+E 


In each of the bottom four cases, the leftmost nonterminal is F, which is the left side of 
two possible productions: 




F+E 

/ \ 

(E) + E i + E 


F * T 

/ \ 


i * T (E) i 


( 3 ) ( 4 ) ( 5 ) ( 6 ) ( 7 ) ( 8 ) 


Of these, we can drop branch numbers 1, 3, 5, and 7 from further consideration be¬ 
cause they have introduced the terminal character which is not the first (or any) letter 
of our word. Once a terminal character appears in a working string, it never leaves. Pro¬ 
ductions change the nonterminals into other things, but the terminals stay forever. All four 
of those branches can produce only words with parentheses in them, not / + i * i. Branch 8 
has ended its development naturally in a string of all terminals but it is not our target 
word, so we can discontinue the investigation of this branch, too. Our pruned tree looks 
like this: 

s 



Because both branches 7 and 8 vanished, we dropped the line that produced them: 

T=*F 

All three branches have actually derived the first two terminal letters of the words that 
they can produce. Each of the three branches left starts with two terminals that can never 
change. Branch 4 says the word starts with “/ + ”, which is correct, but branches 2 and 6 can 
now produce only words that start “z *”, which is not in agreement with our desired target 
word. The second letter of all words derived on branches 2 and 6 is *; the second letter of the 
target word is +. We must kill these branches before they multiply. 
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Deleting branch 6 prunes the tree up to the derivation E => T, which has proved fruitless 
as none of its offshoots can produce our target word. Deleting branch 2 tells us that we can 
eliminate the left branch out of T + E. With all the pruning we have now done, we can con¬ 
clude that any branch leading to / + / * i must begin 

5 =>£=>7 + £=>/? + £=*/ + £ 

Let us continue this tree two more generations. We have drawn all derivation possibili¬ 
ties. Now it is time to examine the branches for pruning. 


i + T+E i+T 

i+F*T+E i + F + E i + F*T i + F 


At this point, we are now going to pull a new rule out of our hat. Because no production 
in any CFG can decrease the length of the working string of terminals and nonterminals on 
which it operates (each production replaces one symbol by one or more), once the length of 
a working string has passed 5, it can never produce a final word of only 5 length. We can 
therefore delete branch 9 on this basis alone. No words that it generates can have as few as 
five letters. 

Another observation we can make is that even though branch 10 is not too long and it 
begins with a correct string of terminals, it can still be eliminated because it has produced 
another + in the working string. This is a terminal that all descendants on the branch will 
have to include. However, there is no second + in the word we are trying to derive. There¬ 
fore, we can eliminate branch 10 , too. 

This leaves us with only branches 11 and 12 that continue to grow. 










■IBIS 


i+F* T 

^ \ 

■ (E) *T i + i * T 


Now branches 13 and 15 have introduced the forbidden terminal while branch 16 
has terminated its growth at the wrong word. Only branch 14 deserves to live. At this point, 
we draw the top half of the tree horizontally: 




B++" 

ps 

jjgl 5 ; 


m 


I 
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In this way, we have discovered that the word i + i*i can be generated by this CFG and 
we have found the unique leftmost derivation that generates it. 

To recapitulate the algorithm: From every live node we branch for all productions ap¬ 
plicable to the leftmost nonterminal. We kill a branch for having the wrong initial string of 
terminals, having a bad terminal anywhere in the string, simply growing too long, or turning 
into the wrong string of terminals. 

With the method of tree search known as backtracking, it is not necessary to grow all 
the live branches at once. Instead, we can pursue one branch downward until either we reach 
the desired word, or else we terminate it because of a bad character or excessive length. At 
this point, we back up to a previous node to travel down the next road until we find the target 
word or another dead end, and so on. Backtracking algorithms are more properly the subject 
of a different course. As usual, we are more interested in showing what can be done, not in 
determining which method is best. 

We have only given a beginner’s list of reasons for terminating the development of a 
node in the tree. A more complete set of rules follows: 

1. Bad Substring: If a substring of solid terminals (one or more) has been introduced into a 
working string in a branch of the total-language tree, all words derived from it must also 
include that substring unaltered. Therefore, any substring that does not appear in the tar¬ 
get word is cause for eliminating the branch. 

2. Good Substrings but Too Many: The working string has more occurrences of the par¬ 
ticular substring than the target word does. In a sense, Rule 1 is a special case of 
this. 

3. Good Substrings but Wrong Order: If the working string is YahXYbaXX but the target 
word is bbbbaab, then both substrings of terminals developed so far, ab and ba, are 
valid substrings of the target word, but they do not occur in the same order in the 
working string as in the word. So, the working string cannot develop into the target 
word. 

4. Improper Outer-terminal Substring: Substrings of terminals developed at the beginning 
or end of the working string will always stay at the ends at which they first appear. They 
must be in perfect agreement with the target word or the branch must be eliminated. 

5. Excess Projected Length: If the working string is aXbbYYXa and all the productions 
with a left side of X have right sides of six characters, then the shortest length of the ul¬ 
timate words derived from this working string must have a length of at least 
1 + 6 + 1 + 1 + 1 + 1 + 6+1 = 18. If the target word has fewer than 18 letters, kill 
this branch. (We are assuming that all A-productions have been eliminated.) 

6 . Wrong Target Word: If we have only terminals left but the string is not the target word, 
forget it. 

There may be even more rules depending on the exact nature of the grammar. These rules 
apply to more than just PLUS-TIMES, as we can see from the following example. 
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EXAMPLE 


Let us recall the CFG for the language EQUAL: 

S-+aB | M 
A-*a [ aS | bAA 
B-+b j bS | aBB 

The word bbabaa is in EQUAL. Let us determine a leftmost derivation for this word by 
top-down parsing. 

From the start symbol 5, the derivation tree can take one of two tracks. 

/\ 

aB bA 

( 1 ) ( 2 ) 


jjfc ■ 

L 

i 


Ml 

M . | 

HI 

w-. ■ ■ 

- 


All words derived from branch 1 must begin with the letter a , but our target word does J 
not. Therefore, by Rule 4, only branch 2 need be considered. The leftmost nonterminal is 
now A . There are three branches possible at this point: 


baS bbAA 


Branch 3 is a completed word but not our target word. Branch 4 will generate only 
words with an initial string of terminals ba, which is not the case with bbabaa. Only branch >4 
5 remains a possibility. The leftmost nonterminal in the working string of branch 5 is the - 
first A. Three productions apply to it: jjjj 


bbbAAA 

( 8 ) 






Branches 6 and 7 seem perfectly possible. Branch 8, however, has generated the termi¬ 
nal substring bbb, which all its descendants must bear. This substring does not appear in our - 
target word, so we can eliminate this branch from further consideration. 

In branch 6, the leftmost nonterminal is A\ in branch 7, it is S. 




is H 


bbaaS bbabAA 
( 10 ) ( 11 ) 


bbaaBA 

( 12 ) 


bbabAA 

(13) 


Branch 9 is a string of all terminals, but not the target word. Branch 10 has the initial 
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substring bbaa\ the target word does not. This detail also kills branch 12. Branch 11 and 
branch 13 are identical. If we wanted all the leftmost derivations of this target word, we 
would keep both branches growing. Because we need only one derivation, we may just as 
well keep branch 13 and drop branch 11 (or vice versa); whatever words can be produced on 
one branch can be produced on the other. 

S => bA => bbAA =» bbaSA =* bbabAA 



bbahaA bbabaSA bbabbAAA 

(14) (15) (16) 

Only the working string in branch 14 is not longer than the target word. Branches 15 
and 16 can never generate a six-letter word. 

S =* bA bbAA =* bbaSA ** bbabAA => bbabaA 



bbabaa bbabaaS bbababAA 

(17) (18) (19) 

Branches 18 and 19 are too long, so it is a good thing that branch 17 is our word. This com¬ 
pletes the derivation. ■ 

The next parsing algorithm we shall illustrate is the bottom-up parser. This time we do 
not ask what were the first few productions used in deriving the word, but what were the last 
few. We work backward from the end to the front, the way sneaky people do when they try 
to solve a maze. 

Let us again consider as our example the word i + i * i generated by the CFG PLUS- 
TIMES. 

If we are trying to reconstruct a leftmost derivation, we might think that the last terminal 
to be derived was the last letter of the word. However, this is not always the case. For exam¬ 
ple, in the grammar 

S-^Abb 

A~->a 

the word abb is formed in two steps, but the final two b’ s were introduced in the first step of 
the derivation, not the last. So instead of trying to reconstruct specifically a leftmost deriva¬ 
tion, we have to search for any derivation of our target word. This makes the tree much 
larger. We begin at the bottom of the derivation tree, that is, with the target word itself, and 
step by step work our way back up the tree seeking to find when the working string was the 
one single S. 

Let us reconsider the CFG PLUS-TIMES: 

S E 

E—+T + E | T 
T->F*T | F 
F—*(E) | i 

To perform a bottom-up search, we shall be reiterating the following step: Find all substrings 
of the present working string of terminals and nonterminals that are right halves of produc¬ 
tions and substitute back to the nonterminal that could have produced them. 

Three substrings of i + i * i are right halves of productions, namely, the three fs, any¬ 
one of which could have been produced by an F. The tree of possibilities begins as follows: 
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Even though we are going from the bottom of the derivation tree to the top S, we will 
still draw the tree of possibilities, as all our trees, from the top of the page downward. 

We can save ourselves some work in this particular example by realizing that all the i s 
come from the production F~*i and the working string we should be trying to derive is 
F + F * F. Strictly speaking, this insight should not be allowed because it requires an idea 
that we did not include in the algorithm to begin with. But because it saves us a considerable 
amount of work, we succumb to the temptation and write in one step: 

i + i * i 

l 

F + F * F 

Not all the F \s had to come from T-+ F. Some could have come from T~*F * T, so we 
cannot use the same trick again. 

i + / * i 

I 

*F + F * F 


T -l- F*F F 4- T * F F + F * T 

The first two branches contain substrings that could be the right halves of E -* T and 
T~*F. The third branch has the additional possibility of T~* F * T. 

The tree continues: 


F + F *F 


F + F *T 


F+T*F 


T +F*F 


( 10 ) 


We never have to worry about the length of the intermediate strings in bottom-up pars¬ 
ing because they can never exceed the length of the target word. At each stage, they stay the 
same length or get shorter. Also, no bad terminals are ever introduced because no new termi¬ 
nals are ever introduced at all, only nonterminals. These are efficiencies that partially com¬ 
pensate for the inefficiency of not restricting ourselves to leftmost derivations. 

There is the possibility that a nonterminal is bad in certain contexts. For example, 
branch 1 now has an E as its leftmost character. The only production that will ever absorb 
that E is S-*E. This would give us the nonterminal S, but S is not in the right half of any 
production. It is true that we want to end up with the S; that is the whole goal of the tree. 
However, we shall want the entire working string to be that single 5, not a longer working 
string with S as its first letter. The rest of the expression in branch 1, “ + F * F ”, is not just 
going to disappear. So, branch 1 gets the ax. The £’s in branch 5 and branch 9 are none too 
promising either, as we shall see in a moment. 

When we go backward, we no longer have the guarantee that the “inverse grammar is 
unambiguous even though the CFG itself might be. In fact, this backward tracing is probably 
not unique, because we are not restricting ourselves to finding a leftmost derivation. We 
should also find the trails of rightmost derivations and what-not. This is reflected in the oc¬ 
currence of repeated expressions in the branches. In our example, branch 2 is now the same 
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as branch 4, branch 3 is the same as branch 7, and branch 6 is the same as branch 8. Because 
we are interested here in finding any one derivation, not all derivations, we can safely kill 
branches 2, 3, and 6 and still find a derivation—if one exists. 

The tree grows ferociously, like a bush, very wide but not very tall. It would grow too 
unwieldy unless we made the following observation. 

Observation 

No intermediate working string of terminals and nonterminals can have the substring “E *”. 
This is because the only production that introduces the * is 

T^>F*T 

so the symbol to the immediate left of an * is originally F. From this F, we can only get the 
terminals “)” or i next to the star. Therefore, in a top-down derivation we could never create 
the substring “£ *” in this CFG, so in bottom-up this can never occur in an intermediate 
working string leading back to S. Similarly, “E + ” and “* £” are also forbidden in the sense 
that they cannot occur in any derivation. The idea of forbidden substrings is one that we 
played with in Chapter 3. We can now see the importance of the techniques we introduced 
there for showing certain substrings never occur [and everybody thought Theorems 2, 3, and 
4 (see pp. 26-27) were completely frivolous]. With the aid of this observation, we can elimi¬ 
nate branches 5 and 9. 

The tree now grows as follows (pruning away anything with a forbidden substring): 

i + i* i 


F + F*F 



F+T*F F+F*T 



T + T*T T+T*T T+T T+T*T T+T F + E 

(ID (12) (13) (14) (15) (16) 


Branches 11, 12, and 13 are repeated in 14 and 15, so we drop the former. Branch 14 has 
nowhere to go, because none of the T s can become ZTs without creating forbidden substrings. 
So, branch 14 must be dropped. From branches 15 and 16, the only next destination is T + E, so 
we can drop branch 15 because 16 gets us there just as well by itself. The tree ends as follows: 

i + i * i*=F 4- F * F ^=F + F * T <=F 4 - T <= F 4 - E <^=T 4 - E ^E <= S 

which is the same as 

S=+E=>T 4-£^F + F : =>/ 7 4-T , ==>F 4 - F * T=>F + F*F=>i + i* i 

(The symbol <= used above should be self-explanatory.) 

Our last algorithm for “understanding” words in order to evaluate expressions is one based 
on the prefix notation mentioned in Chapter 12, called Lukasiewicz notation. This applies to 
not only arithmetic expressions, but also many other programming language instructions. 

We shall assume that we are now using postfix notation, where the two operands imme- 


diately precede the operator: 



A+B 

becomes 

AB + 

(A + B)*C 

becomes 

AB + C* 

A* (B + C * D) 

becomes 

ABCD * 4- * 
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Let us trace the action of this machine on the input string: 

75 + 24 + *6 + 


which is postfix for 


TAPE 


STATE STACK 


START A 
READ A 
PUSH i 7 


READ 
PUSH i 
READ 


PUSH i 


An algorithm for converting standard infix notation into postfix notation was given in 
Chapter 12. Once an expression is in postfix, we can evaluate it without finding its derivation 
from a CFG, although we originally made use of its parsing tree to convert the infix into 
postfix in the first place. We are assuming here that our expressions involve only numerical 
values for the identifiers (/’s) and only the operations + and *, as in the language PLUS- 
TIMES. 

We can evaluate these postfix expressions by a new machine similar to a PDA. Such a 
machine requires three new states: 

1. ADD : This state pops the top two entries off the STACK, adds them, and pushes the 
result onto the top of the STACK. 

2. 1MPY | : This state pops the top two entries off the STACK, multiplies them, and pushes 
the result onto the top of the STACK. 

3. /PRINT/ : The print state always follows a POP or READ. This prints the last character 
just popped or read. 

The machine to evaluate postfix expressions can now be built as below, where the ex¬ 
pression to be evaluated has been put on the INPUT TAPE in the usual fashion—one char¬ 
acter per cell starting in the first cell. 
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STATE 

STACK 

TAPE 

ADD 

12 

2 4 + * 6 + 

READ 

12 

4 + * 6 + 

PUSH / 

2 12 

4 + * 6 + 

READ 

2 12 

+ * 6 + 

PUSH i 

42 12 

+ * 6 + 

READ 

4 2 12 

* 6 + 

ADD 

6 12 

* 6 + 

READ 

6 12 

6 + 

MPY 

72 

6 + 

READ 

72 

+ 

PUSH i 

6 72 

+ 

READ 

6 72 

A 

ADD 

78 

A 

READ 

78 

A 


We notice that just as we finished reading the entire input string, the STACK has only one el¬ 
ement in it. We conclude processing by popping 78, printing 78, and accepting the input string. 

What we have been using here is a PDA with arithmetic and output capabilities. Just as we 
expanded FAs to Mealy and Moore machines, we can expand PDAs to what are called push¬ 
down transducers. These are very important but belong to the study of the theory of compilers. 

The task of converting infix arithmetic expressions (normal ones) into postfix can also 
be accomplished by a pushdown transducer as an alternative to depending on a dotted line 
circumnavigating a parsing tree. This time all we require is a PDA with an additional PRINT 
instruction. The input string will be read off of the TAPE character by character. If the char¬ 
acter is a number (or, in our example, the letters a , b , c), it is immediately printed out, be¬ 
cause the operands in postfix occur in the same order as in the infix equivalent. The opera¬ 
tors, however, + and * in our example, must wait to be printed until after the second operand 
they govern has been printed. The place where the operators wait is, of course, the STACK. 
If we read a + b, we print a , push + , print b , pop +, print + . The output states we need are 
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PUSH { 


PUSH ( 


PUSH + 


PUSH 


PUSH ( 


PUSH + 


STATE 


STACK 


TAPE 


OUTPUT 


START 


READ 


PUSH ( 


READ 


PRINT 


POP-PRINT prints whatever it has just popped, and READ-PRINT prints the character just 
read. READ-PUSH pushes whatever character “+” or or “(” labels the edge leading 
into it. These are all the machine parts we need. 

One more comment should be made about when an operator is ready to be popped. The 
second operand is recognized by encountering (1) a right parenthesis, (2) another operator 
having equal or lower precedence, or (3) the end of the input string. 

When a right parenthesis is encountered, it means that the infix expression is complete 
back up to the last left parenthesis. 

For example, consider the expression 


1. Read a , print a. 

2. Read *, push *. 

3. Read (, push (. 

4. Read b , print b. 

5. Read +, push +. 

6. Read c, print c. 

7. Read ), pop + , print + . 

8. Pop (. 

9. Read +, we cannot push + on top of * because of operator precedence, so pop *, print 
*, push +. 

10. Read b, print b. 

11. Read +, we cannot push + on top of +, so print + . 

12. Read c, print c. 

13. Read A, pop + , print +. 

The resulting output sequence is 

abc + * b + c + 

which indeed is the correct postfix equivalent of the input. Notice that operator precedence i 
“built into’’ this machine. Generalizations of this machine can handle any arithmetic expres 
sions including —, /, and **. j 

The diagram of the pushdown transducer to convert infix to postfix is given on the next 
page. 

The table following it traces the processing of the input string 

(a + b) * (b + c * a) 

Notice that the printing takes place on the right end of the output sequence. 

One trivial observation is that this machine will never print any parentheses. No parer 
theses are needed to understand postfix or prefix notation. Another is that every operator an 
operand in the original expression will be printed out. The major observation is that if th 
output of this transducer is then fed into the previous transducer, the original infix arithmeti 
expression will be evaluated correctly. In this way, we can give a PDA an expression in no 
mal arithmetic notation, and the PDA will evaluate it. 


The pushdown transducer will do the following: 


READ 
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STATE 


POP 


PUSH ( 


PUSH + 


READ 


PRINT 


READ 


POP 


PRINT 


POP 


READ 


POP 


PUSH * 


READ 


PUSH ( 


READ 


PRINT 


READ 


POP 


PUSH ( 


PUSH + 


READ 


PRINT 


READ 


POP 


PUSH + 


PUSH * 


READ 


PRINT 


READ 


POP 


PRINT 


POP 


PRINT 


POP 


STACK 


(* 


+ (* 


+ (* 


+ (* 


+ (* 


(* 


+ (* 


* + (* 


* + ( * 


* + ( * 


* + (* 


+ (* 


+ ( * 


(* 


( * 


TAPE 


b) * (b + c * a) 


b) * (b + c * a) 


b) * (b + c * a) 


) * (b + c * a) 


) * (b + c * a) 


* (b + c * a) 


* (b + c * a) 


* {b + c * a) 


* (b + c * a) 


(b + c * a) 


(b + c * a) 


(b + c * a) 


b + c * a) 


b + c * a) 


+ c * a) 


+ c * a) 


c * a) 


c * a) 


c * a) 


c * a) 


* a) 


* a) 


a) 


a) 


a) 


a) 



OUTPUT 

a 

a 

a 

a 

ah 

ab 

ab 

ab 

+ 


ab 

+ 


ab 

+ 


ab 

+ 


ab 

+ 


ab 

+ 


ab 

+ 


ab 

+ 


ab 

+ 

b 

ab 

+ 

b 

ab 

+ 

b 

ab 

+ 

b 

ab 

+ 

b 

ab 

+ 

b 

ab 

+ 

be 

ab 

+ 

be 

ab 

+ 

be 

ab 

+ 

be 

ab 

+ 

be 

ab 

+ 

be 

ab 

+ 

bca 

ab 

+ 

bca 

ab 

+ 

bca 

ab 

+ 

bca * 

ab 

+ 

bca * 

ab 

+ 

bca * + 

ab 

+ 

bca * + \ 
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STATE 


STACK 


TAPE 


OUTPUT 


READ 


PRINT 


ACCEPT 


4* PROBLEMS 


I. Decide whether or not the following grammars generate any words using the algorithm 
of Theorem 42 (p. 403): 

(i) S-^aSa | bSb (iv) S-^XS 

(ii) S—*XY x-^YX 


C—*b\bb 

2. Modify the proof of Theorem 42 so that it can be applied to any CFG, not just those 
m C3STF. 

3. For each of the following grammars, decide whether the language they generate is finite 
or infinite using the algorithm in Theorem 44 (p. 408): 

(i) S^>XS | b (v) S^XY 

X^>YZ X-^AA \YY\b 


4 . Modify Theorem 44 so that the decision procedure works on all CFGs, not just those 
in CNF. J 
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5. Prove that all CFGs with only the one nonterminal S and one or more live productions 
and one or more dead productions generate an infinite language. 

For the following grammars and target strings, decide whether or not the word is gener¬ 
ated by the grammar using the CYK algorithm: 

6. 5 —* SS x — abba 


7. S-»XS x = baab 
X-+XX 

X-^a 

S-^b 

8. S-^XY x = abbaa 

X~*SY 

Y^SS 
X-^a | bb 
Y-^aa 

9. S-^AB x-bbaab 

A-^BB | a 

B-^AB \ b 

10 . S-*AB J CD | a | b x = bababab 
A^>a 

B-+SA 

C-^DS 

D-+b 

11 . Modify the CYK algorithm so that it applies to any CFG, not just those in CNF 

12 . The CYK algorithm can be described as bottom-up because it starts with the word an® 
works up to the nonterminals. There is another method for deciding membership that is 
top-down in nature. Create a table with one column for each nonterminal that appears in 
the grammar and n rows, where n is the length of the subject word. The entries for cell 
(/J) are those words of length i that can be derived from the nonterminal, N } , at the head 
of the column. The first row is filled based on the dead productions N—*t. Subsequent 
rows are filled based on the productions N —■* N { N r In the second row, cell (2, z ) is filled 
with all the words of length 2 that are the product of a letter from cell (1, a) and a letter 
from cell (l,y) for each rule H z ~*N x N y . In the third row, cell (3,z) is filled with the 
words that are products of a word from row 2 and a word from row 1 in either order as> 
long as the grammar includes a rule that generates that product. In the fourth row, the 
words can be made in three ways; the product of a letter and a 3-letter word, the product 
of two 2-letter words, the product of a 3-letter word and a single letter. When the table is 
complete, check cell {n, S ) to see if w is among the words derived from S. 

For each of the following grammar-word pairs, construct such a table to determine 
whether the word can be generated by that grammar: 


S^XY 
X-+XA | a 
Y —*AY | a 
A—>a 

w = babaa 


5->AX | E 
X^SA 
Y —*SB 
A —> a 
B~*b 
w — ababa 


■ abbaa 



AM 

n 

IF, 

. 

ij 
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■ 
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wk 
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13. Using top-down parsing, find the leftmost derivation in the grammar PLUS-TIMES for 
the following expressions: 

(i) i + i + i 

(ii) /*/ + /* i 

(iii) i * (/ + /) * i 

(iv) ((0*0 + 0) + / 

(v) (((0) + ((0» 

14. Using bottom-up parsing, find any derivation in the grammar PLUS-TIMES for the fol¬ 
lowing expressions: 

(0 i*(i) 

(ii) ((f) + ((/))) 

(iii) ( i * i + i) 

(iv) i * (i + i) 

(v) 0*0*/ 

15. The following is a version of an unambiguous grammar for arithmetic expressions em¬ 
ploying - and / as well as + and *: 

S-+E 

E^>T\E + T\E-T\ -T 
T-+F \T*F\T/F 
F—*(E) | / 

Find a leftmost derivation in this grammar for the following expressions using the pars¬ 
ing algorithms specified: 

(i) (0 + 0 ~ / * 0 / / — / 

(Do this by inspection; that means guesswork. Do we divide by zero here?) 

(ii) H i + i (Top-down) 

(iii) i* i / i — i (Top-down) 

(iv) if Hi (Top-down) 

Note that this is not ambiguous in this particular grammar. Do we evaluate right to 
left or left to right? 

(v) i - i — i (Bottom-up) 

16. Using the second pushdown transducer, convert the following arithmetic expressions to 
postfix notation and then evaluate them on the first pushdown transducer: 

(i) 2 * (7 + 2) 

(ii) 3*4 + 7 

(iii) (3 + 5) + 7 * 3 

(iv) (3 * 4 + 5) * (2 + 3 * 4) Hint: The answer is 238. 

17. Design a pushdown transducer to convert infix to prefix. 

18. Design a pushdown transducer to evaluate prefix. 

19. Create an algorithm to convert prefix to postfix. 

20. The transducers we designed in this chapter to evaluate postfix notation and to convert 
infix to postfix have a funny quirk: They can accept some bad input strings and process 
them as if they were proper. 

(i) For each machine, find an example of an accepted bad input. 

(ii) Correct these machines so that they accept only proper inputs. 

















THE TURING MACHINE 

At this point it will help us to recapitulate the major themes of the previous two parts and 
outline all the material we have yet to present in the rest of the book in one large table: 


Language 



Language 

Defined 

Corresponding 

Nondeterminism 

Closed 

by 

Acceptor 

= Determinism? 

Under 

Regular 

expression 

Finite 

automaton, 

Yes 

Union, 

product, 


Context- 

free 

grammar 

Type 0 
grammar 


transition 

graph 

Pushdown 

automaton 


Turing 

machine, 

Post machine, 
2PDA, rcPDA 


Kleene star, 
intersection, 
complement 

Union, 
product, 
Kleene star 

Union, 
product, 
intersection, 
Kleene star 


What Can 
Be Decided 

Equivalence, 

emptiness, 

finiteness, 

membership 

Emptiness 

finiteness 

membership 

Not much 


Example 

of 

Application 

Text editors, 

sequential 

circuits 


Programming 

language 

statements, 

compilers 

Computers 


We see from the lower right entry in the table that we are about to fulfill the promise ; 
made in the introduction. We shall soon provide a mathematical model for the entire family ; 
of modem-day computers. This model will enable us not only to study some theoretical limi-1: 
tations on the tasks that computers can perform; it will also be a model that we can use to 
show that certain operations can be done by computer. This new model will turn out to be; 
surprisingly like the models we have been studying so far. 

Another interesting observation we can make about the bottom row of the table is that: 
we take a very pessimistic view of our ability to decide the important questions about this: 
mathematical model (which as we see is called a Turing machine). 

We shall prove that we cannot even decide whether a given word is accepted by a given 
Turing machine. This situation is unthinkable for FAs or PDAs, but now it is one of the 
unanticipated facts of life—a fact with grave repercussions. 
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There is a definite progression in the rows of this table. All regular languages are con¬ 
text-free languages, and we shall see that all context-free languages are Turing machine lan¬ 
guages. Historically, the order of invention of these ideas is as follows: 

1. Regular languages and FAs were developed by Kleene, Mealy, Moore, Rabin, and Scott 
in the 1950s. 

2. CFGs and PDAs were developed later, by Chomsky, Oettinger, Schiitzenberger, and 
Evey, mostly in the 1960s. 

3. Turing machines and their theory were developed by Alan Mathison Turing and Emil 
Post in the 1930s and 1940s. 

It is less surprising that these dates are out of order than that Turing’s work predated the 
invention of the computer itself. Turing was not analyzing a specimen that sat on the table in 
front of him; he was engaged in inventing the beast. It was directly from the ideas in his 
work on mathematical models that the first computers (as we know them) were built. This is 
another demonstration that there is nothing more practical than a good abstract theory. 

Because Turing machines will be our ultimate model for computers, they will necessar¬ 
ily have output capabilities. Output is very important, so important that a program with no 
output statements might seem totally useless because it would never convey to humans the 
result of its calculations. We may have heard it said that the one statement every program 
must have is an output statement. This is not exactly true. Consider the following program 
(written in no particular language): 

1. READ X 

2. If X = 1 THEN END 

3. IF X = 2 THEN DIVIDE X BY 0 

4. IF X > 2 THEN GOTO STATEMENT 4 

Let us assume that the input is a positive integer. If the program terminates naturally, 
then we know X was 1. If it terminates by creating overflow or was interrupted by some er¬ 
ror message warning of illegal calculation (crashes), then we know that X was 2. If we find 
that our program was terminated because it exceeded our allotted time on the computer, then 
we know X was greater than 2. We shall see in a moment that the same trichotomy applies to 
Turing machines. 


DEFINITION 

A Turing machine, denoted TM, is a collection of six things: 

1, An alphabet X of input letters, which for clarity’s sake does not contain the blank sym¬ 
bol A. 

2. A Tape divided into a sequence of numbered cells, each containing one character or a 
blank. The input word is presented to the machine one letter per cell beginning in the 
leftmost cell, called cell i. The rest of the Tape is initially filled with blanks, A’s. 


Tape Head 
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3. A Tape Head that can in one step read the contents of a cell on the Tape, replace it with 

some other character, and reposition itself to the next cell to the right or to the left of the 
one it has just read. At the start of the processing, the Tape Head always begins by read¬ 
ing the input in cell i. The Tape Head can never move left from cell i. If it is given or¬ 
ders to do so, the machine crashes. The location of the Tape Head is indicated by (_ 

4. An alphabet T of characters that can be printed on the Tape by the Tape Head. This can 
include 2. Even though we allow the Tape Head to print a A, we call this erasing and 
do not include the blank as a letter in the alphabet T. 

5. A finite set of states including exactly one START state from which we begin execution 
(and which we may reenter during execution) and some (maybe none) HALT states that 
cause execution to terminate when we enter them. The other states have no function* 
only names: 

< 7 , q 2 <? 3 * • • or 12 3 ... 

6. A program, which is a set of rules that tell us, on the basis of the state we are in and the 
letter the Tape Head has just read, how to change states, what to print on the Tape, and 
where to move the Tape Head. We depict the program as a collection of directed edges 
connecting the states. Each edge is labeled with a triplet of information: 

(letter, letter, direction) 

The first letter (either A or from 2 or F) is the character the Tape Head reads from th< 
cell to which it is pointing. The second letter (also A or from T) is what the Tape He/> 
prints in the cell before it leaves. The third component, the direction, tells the Tap 
Head whether to move one cell to the right, R, or one cell to the left, L. 




I 

m 


No stipulation is made as to whether every state has an edge leading from it for every 
possible letter on the Tape. If we are in a state and read a letter that offers no choice of path 
to another state, we crash ; that means we terminate execution unsuccessfully. To terminate 
execution of a certain input successfully, we must be led to a HALT state. The word on the 
input Tape is then said to be accepted by the TM. 

A crash also occurs when we are in the first cell on the Tape and try to move the Tapp 
Head left. f 

By definition, all Turing machines are deterministic. This means that there is no state 
that has two or more edges leaving it labeled with the same first letter. 

For example, 


m 



m 

li 

- 


Jill 

mm 

Bfi 


is not allowed. 


EXAMPLE 

The following is the Tape from a TM about to run on the input aha : 
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i ii iii iv v vi 



Tape Head 

The program for this TM is given as a directed graph with labeled edges as shown 
below: 


(a.a.tf) 


( a,a,R ) (b.b.R) 



(b,b,R) 


Notice that the loop at state 3 has two labels. The edges from state 1 to state 2 could 
have been drawn as one edge with two labels. 

We start, as always, with the Tape Head reading cell i and the program in the START 
state, which is here labeled state 1. We depict this as 

1 

aba 

The number on top is the number of the state we are in. Below that is the current meaningful 
contents of the string on the Tape up to the beginning of the infinite run of blanks. It is possi¬ 
ble that there may be a A inside this string. We underline the character in the cell that is 
about to be read. 

At this point in our example, the Tape Head reads the letter a and we follow the edge 
{a, a, R) to state 2. The instructions of this edge to the Tape Head are “read an a , print an a , 
move right.” 

The Tape now looks like this: 



We can record the execution process by writing 

1 2 
aba aba 

At this point, we are in state 2. Because we are reading the b in cell ii, we must take the 
ride to state 3 on the edge labeled ( b , b , R). The Tape Head replaces the b with a b and 
moves right one cell. The idea of replacing a letter with itself may seem silly, but it unifies 
the structure of TMs. 

We are now up to 

1 2 3 

aba aha aba 
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The Tape now looks like this: 


11 111 IV 


We are in state 3 reading an a , so we loop. That means we stay in state 3, but we move 
the Tape Head to cell iv: 

3 ^ 3 
aba abaA 

This is one of those times when we must indicate a A as part of the meaningful contents 
of the Tape. 

We are now in state 3 reading a A, so we move to state 4: 

3 -> 4 
aba A aba A A 

The input string aba has been accepted by this TM. This particular machine did not 
change any of the letters on the Tape, so at the end of the run the Tape still reads abaA . . . ' 
This is not a requirement for the acceptance of a string, just a phenomenon that happened 
this time. 

In summary, the whole execution can be depicted by the following execution chain, 
also called a process chain or trace of execution, or simply a trace: 

12 3 3 

1 —^ > —♦HALT 

aba aha aba abaA 

This is a new use for the arrow. It is neither a production nor a derivation. 

Let us consider which input strings are accepted by this TM. Any first letter, a or wi 
lead us to state 2. From state 2 to state 3, we require that we read the letter b. Once in state 3, 
we stay there as the Tape Head moves right and right again, moving perhaps many cells un¬ 
til it encounters a A. Then we get to the HALT state and accept the word. Any word that 
reaches state 3 will eventually be accepted. If the second letter is an a , then we crash at state 
2. This is because there is no edge coming from state 2 with directions for what happens 
when the Tape Head reads an a. 

The language of words accepted by this machine is: All words over the alphabet {a b} 
in which the second letter is a b. 

This is a regular language because it can also be defined by the regular expression 

(a + b)b(a + b)* 

This TM is also reminiscent of FAs, making only one pass over the input string, movin 
its Tape Head always to the right, and never changing a letter it has read. TMs can do m 
tricks, as we shall soon see. 


■HPi 


EXAMPLE 


Consider the following TM: 
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[a,a,R) 

(B,B,R) { B,B,L ) ( B,B,R ) 



{a,a,L ) 


We have only drawn the program part of the TM, because initial appearance of the Tape 
depends on the input word. This is a more complicated example of a TM. We analyze it by 
first explaining what it does and then recognizing how it does it. 

The language this TM accepts is { a n b n }. 

By examining the program, we can see that the Tape Head may print any of the letters 

a , A, or B or a A, and it may read any of the letters a, b. A, or B or a blank. Technically, the in¬ 
put alphabet is 2 = [a b) and the output alphabet is T = [a A B }, because A is the sym¬ 
bol for a blank or empty cell and is not a legal character in an alphabet. Let us describe the 
algorithm, informally in English, before looking at the directed graph that is the program. 

Let us assume that we start with a word of the language { a n b n } on the Tape. We begin by 
taking the a in the first cell and changing it to the character A. (If the first cell does not contain 
an a, the program should crash. We can arrange this by having only one edge leading from 
START and labeling it to read an #.) The conversion from d.to A means that this a has been 
counted. We now want to find the /> in the word that pairs off with this a. So, we keep moving 
the Tape Head to the right, without changing anything it passes over, until it reaches the first 

b. When we reach this b, we change it into the character B , which again means that it too has 
been counted. Now we move the Tape Head back down to the left until it reaches the first un¬ 
counted a. The first time we make our descent down the Tape, this will be the a in cell ii. 

How do we know when we get to the first uncounted al We cannot tell the Tape Head 
to “find cell ii.” This instruction is not in its repertoire. We can, however, tell the Tape Head 
to keep moving to the left until it gets to the character A. When it hits the A, we bounce one 
cell to the right and there we are. In doing this, the Tape Head passed through cell ii on its 
way down the Tape. However, when we were first there, we did not recognize it as our desti¬ 
nation. Only when we bounce off of our marker, the first A encountered, do we realize where 
we are. Half the trick in programming TMs is to know where the Tape Head is by bouncing 
off of landmarks. 

When we have located this leftmost uncounted a , we convert it into an A and begin 
marching up the Tape looking for the corresponding b. This means that we skip over some 
a 's and over the symbol B, which we previously wrote, leaving them unchanged, until we get 
to the first uncounted b. Once we have located it, we have found our second pair of a and b. 
We count this second b by converting it into a B, and we march back down the Tape looking 
for our next uncounted a. This will be in cell iii. Again, we cannot tell the Tape Head to 
“find cell iii.” We must program it to find the intended cell. The same instructions as given 
last time work again. Back down to the first A we meet and then up one cell. As we march 
down, we walk through a B and some a 's until we first reach the character A. This will be the 
second A, the one in cell ii. We bounce off this to the right, into cell iii, and find an a. This 
we convert to A and move up the Tape to find its corresponding b. 
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This time marching up the Tape, we again skip over a's and B's until we find the first s. 
We convert this to B and march back down, looking for the first unconverted a. We repeat the 
pairing process over and over. 

What happens when we have paired off all the tf’s and b' s? After we have converted our 
last b into a B and we move left, looking for the next a, we find that after marching left back 
through the last of the B' s, we encounter an A. We recognize that this means we are out of 
little a’s in the initial field of a 's at the beginning of the word. 

We are about ready to accept the word, but we want to make sure that there are no more 
b's that have not been paired off with a's, or any extraneous a’s at the end. Therefore, we 
move back up through the field of B's to be sure that they are followed by a blank; otherwise, 
the word initially may have been aaabbbb or aaabbba. 

When we know that we have only A's and B's on the Tape, in equal number, we can ac¬ 
cept the input string. 

The following is a picture of the contents of the Tape at each step in the processing of 
the string aaabbb. Remember, in a trace the Tape Head is indicated by the underlining of the 
letter it is about to read: 


Bo..; 

is 


aaabbb 

Aaabbb 

Aaabbb 

Aaabbb 

AaaBbb 

AaaBbb 

AaaBbb 

AaaBbb 

AAaBbb 

AAaBbb 

AAaBbb 

AAaBBb 

AAaBBb 

AAaBBb 

AAaBBb 

AAABBb 

AAABBb 

AAABBb 

AAABBB 

AAABBB 

AAABBB 

aaabbb 

AAABBB 

AAABBB 

AAABBBA 

HALT 








m 




Based on this algorithm, we can define a set of states that have the following meanings: 

State 1 This is the START state, but it is also the state we are in whenever we are 
about to read the lowest unpaired a. In a PDA we can never return to the 
START suite, but in a TM we can. The edges leaving from here must convert 
this a to the character A and move the Tape Head right and enter state 2. 

State 2 This is the state we are in when we have just converted an a to an A and we are 
looking for the matching b. We begin moving up the Tape. If we read another a, we 
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leave it alone and continue to march up the Tape, moving the Tape Head always to 
the right. If we read a B, we also leave it alone and continue to move the Tape 
Head right. We cannot read an A while in this state. In this algorithm, all the A’s re¬ 
main to the left of the Tape Head once they are printed. If we read A while we are 
searching for the b, we are in trouble because we have not paired off our a. So, we 
crash. The first b we read, if we are lucky enough to find one, is the end of the 
search in this state. We convert it to B , move the Tape Head left, and enter state 3. 

State 3 This is the state we are in when we have just converted a b to B, We should now 
march left down the Tape, looking for the field of unpaired a's. If we read a B, we 
leave it alone and keep moving left. If and when we read an a , we have done our 
job. We must then go to state 4, which will try to find the leftmost unpaired a. If 
we encounter the character b while moving to the left, something has gone very 
wrong and we should crash. If, however, we encounter the character A before we 
hit an a , we know that we have used up the pool of unpaired a's at the beginning 
of the input string and we may be ready to terminate execution. Therefore, we 
leave the A alone and reverse directions to the right and move into state 5. 

State 4 We get here when state 3 has located the rightmost end of the field of unpaired 
a’s. The Tape and Tape Head situation looks like this: 



In this state, we must move left through a block of solid a’s (we crash if we 
encounter a b , B, or A) until we find an A. When we do, we bounce off it to the 
right, which lands us at the leftmost uncounted a. This means that we should 
next be in state 1 again. 

State 5 When we get here, it must be because state 3 found that there were no un¬ 
paired a’s left and it bounced us off the rightmost A. We are now reading the 
leftmost B as in the picture below: 



It is now our job to be sure that there are no more a's or b's left in this word. 
We want to scan through solid B's until we hit the first blank. Because the pro¬ 
gram never printed any blanks, this will indicate the end of the input string. If 
there are no more surprises before the A, we then accept the word by going to 
the state HALT. Otherwise, we crash. For example, aabba would become 
AABBa and then crash because, while searching for the A, we find an a. 

This explains the TM program that we began with. It corresponds to the depiction above 
state for state and edge for edge. 

Let us trace the processing of the input string aabb by looking at its execution chain: 


1 

2 

2 

3 

4 

1 

aabb —> 

Aabb 

-> Aabb 

-> AaBb - 

-> AaBb 

-»> AaBb 

2 

2 

3 

3 

5 

5 

AABb 

C 

AABb - 

■* AABB - 

AABB - 

-> AABB 

AABB 

J 

AABBA 

HALT 
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It is clear that any string of the form a n b n will reach the HALT state. To show that any string 
that reaches the HALT state must be of the form a n b n , we trace backward. To reach HALT, 
we must get to state 5 and read a A. To be in state 5, we must have come from state 3 from 
which we read an A and some number of B's while moving to the right. So at the point we 
are in state 3 ready to terminate, the Tape and Tape Head situation is as shown below: 


A \B \B \B 


. 0 A 


To be in state 3 means we have begun at START and circled around the loop some num¬ 
ber of times: 



Every time we go from START to state 3, we have converted an a to an A and a b to a B. No; 
other edge in the program of this TM changes the contents of any cell on the Tape. However 
many R’s there are, there are just as many A’s. Examination of the movement of the Tape 
Head shows that all the A’s stretch in one connected sequence of cells starting at cell i. To 
go from state 3 to HALT shows that the whole Tape has been converted to A’s, then fl’s fol¬ 
lowed by blanks. If we put together all of this, to get to HALT, the input word must be iflf. 
for some n> 0. ■ 


EXAMPLE 


Consider the following TM: 


l b,XL) ^ r y \(XXR) 


I 
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This looks like another monster, yet it accepts the familiar language PALINDROME 
and does so by a very simple deterministic algorithm. 

We read the first letter of the input string and erase it, but we remember whether it was an a 
or a b. We go to the last letter and check to be sure it is the same as what used to be the first let¬ 
ter. If not, we crash, but if so, we erase it too. We then return to the front of what is left of the in¬ 
put string and repeat the process. If we do not crash while there are any letters left, then when 
we get to the condition where the whole Tape is blank, we accept the input string. This means 
that we reach the HALT state. Notice that the input string itself is no longer on the Tape. 

The process, briefly, works like this: 

abbabba 

bbabba 

bbabb 

babb 

bab 

ab 

a 

A 

We mentioned above that when we erase the first letter, we remember what it was as 
we march up to the last letter. Turing machines have no auxiliary memory device, like a 
PUSHDOWN STACK, where we could store this information, but there are ways around 
this. One possible method is to use some of the blank space farther down the Tape for mak¬ 
ing notes. In this case, we use a different trick. The memory of what letter was erased is 

stored in the path through the program the input takes. If the first letter is an a, we are off 
on the state 2-state 3-state 4 loop. If the first letter is a b, we are off on the state 5-state 
6-state 7 loop. 

All of this is clear from the descriptions of the meanings of the states below: 

State 1 When we are in this state, we read the first letter of what is left of the input 
string. This could be because we are just starting and reading cell i or because 
we have been returned here from state 4 or 7. If we read an a , we change it to a 
A (erase it), move the Tape Head to the right, and progress to state 2. If we 
read al), we erase it and move the Tape Head to the right and progress to state 

5. If we read a A where we expect the string to begin, it is because we have 

erased everything, or perhaps we started with the input word A. In either case, 
we accept the word and we shall see that it is in EVENPALINDROME: 



State 2 We get here because we have just erased an a from the front of the remaining 
input string and we want to get to the last letter of the remaining input string to 
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State 3 


State 4 


State 5 


see whether it too is an a. So, we move to the right through all the a’s and h\ 
left in the input until we get to the end of the string at the first A. When that 
happens, we back up one cell (to the left) and move into state 3: 



wssm 

§f 

■ 

1 


We get here only from state 2, which means that the letter we erased at the 
start of the string was an a and state 2 has requested us now to read the last leu 
ter of the string. We found the end of the string by moving to the right until w 
hit the first A. Then we bounced one cell back to the left. If this cell is als 
blank, then there are only blanks left on the Tape. The letters have all bee 
successfully erased and we can accept the word. Everything erased was in th 
form of an ODDPALINDROME, but it had a middle letter of a that was the 
last non-A on the Tape. So, we go to HALT If there is something left of the in¬ 
put string, but the last letter is a b, the input string was not a palindrome^ 
Therefore, we crash by having no labeled edge to go on. If the last letter is an 
a, then we erase it, completing the pair, and begin moving the Tape Head left,, 
down to the beginning of the string again to pair off another set of letters: 



W- y : 

mm 




Notice that when we read the A and move to HALT, we still need to include i 
the edge’s label instructions to write something and move the Tape Hea 
somewhere. The label (A, a , R) would work just as well, or (A, B , R). How¬ 
ever, (A, a, L) might be a disaster. We might have started with a one-letter 
word, say, a. State 1 erases this a. Then state 2 reads the A in cell ii and retu 
us to cell i where we read the blank. If we try to move left from cell i, we era 
on the very verge of accepting the input string. 

Like state 2, this is a travel state searching for the beginning of what is left 
the input string. We keep heading left fearlessly because we know that cell 
contains a A, so we shall not fall off the edge of the earth and crash by goi 
left from cell i. There may be a whole section of A’s so the first A is not nece 
sarily in cell i. When we hit the first A, we back up one position to the righ 
setting ourselves up in state 1 ready to read the first letter of what is left of th 
string: 



We get to state 5 only from state 1 when the letter it has just erased was a b . 
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other words, state 5 corresponds exactly to state 2 but for strings whose re¬ 
mainder begins with a b. It too searches for the end of the string: 


(a,a,R) 

ib,b,R) 



State 6 We get here when we have erased a b in state 1 and found the end of the string 
in state 5. We examine the letter at hand. If it is an a, then the string began with 
b and ended with a , so we crash since it is not in PALINDROME. If it is a b, 
we erase it and hunt for the beginning again. If it is a A, we know that the 
string was an ODDPALINDROME with middle letter b. This is the twin of 
state 3. 

State 7 This state is exactly the same as state 4. We try to find the beginning of the 
string. 

Putting together all these states, we get the picture we started with. Let us trace the run¬ 
ning of this TM on the input string ababa: 


1 

2 

2 

2 

2 

ababa - 

-» A baba 

—> Ababa —> 

Ababa —» 

Ababa 

2 

3 

4 

4 

4 

—» AbabaA - 

-» A baba 

-> AbabA -* 

AbabA —> 

AbabA 

4 

1 

5 

5 

5 

A babA - 

AbabA 

—> AAabA —* 

AAabA -* 

AAabA 

6 

7 

1 

1 

2 

—* AAabA - 

-» AAaAA 

—*• AAaAA —> 

AAaAA -+ 

AAAAA 


3 8 

AAAAA -* HALT ■ 

Our first example was no more than a converted FA, and the language it accepted was 
regular. The second example accepted a language that was context-free and nonregular and 
the TM given employed separate alphabets for writing and reading. The third machine ac¬ 
cepted a language that was also context-free but that could be accepted only by a nondeter- 
ministic PDA, whereas the TM that accepts it is deterministic. 

We have seen that we can use the Tape for more than a PUSHDOWN STACK. In the 
last two examples, we ran up and down the Tape to make observations and changes in the 
string at both ends and in the middle. We shall see later that the Tape can be used for even 
more tasks: It can be used as work space for calculation and output. 

We shall eventually show that TMs are more powerful than PDAs because a Tape can 
do more than a STACK. However, this intuitive notion is not sufficient proof because PDAs 
have the extra power of nondeterminism whereas TMs are limited to being deterministic. 
What we are ready to demonstrate is that TMs are more powerful than FAs. 

THEOREM 46 

Every regular language has a TM that accepts exactly it. 
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PROOF 

Consider any regular language L. Take an FA that accepts L. Change the edge labels a and h 
to (a, a, R) and (ft, b, R), respectively. Change the — state to the word START. Erase the plus 
sign out of each final state and instead add to each of these an edge labeled (A, A, R) leading . 
to a HALT state. Voila, a TM. J 

We read the input string moving from state to state in the TM exactly as we would on 
the FA. When we come to the end of the input string, if we are not in a TM state correspond¬ 
ing to a final state in the FA, we crash when the Tape Head reads the A in the next cell. If 
the TM state corresponds to an FA final state, we take the edge labeled (A, A, R) to HALT. 
The acceptable strings are the same for the TM and the FA. ■ 


EXAMPLE 

Let us build a TM to accept the language EVEN-EVEN—the collection of all strings with 
an even number of a's and an even number of b' s. | 

By the above algorithm, the machine is 

(b,b,R) : ;2 




( a,a,R) 


EXAMPLE || 

Now we shall consider a valid but problematic machine to accept the language of all strings^ 
that have a double a in them somewhere: . J 

(XXR) 

( b,b,R ) 1 

O _. : 

( 1 START ) >( 7 ) . 


( b,b,R ) 


The problem is that we have labeled the loop at the START state with the extra option 
(A, A, R). This is still a perfectly valid TM because it fits all the clauses in the definition. 
Any string without a double a that ends in the letter a will get to state 2, where the TAp!p 
Head will read a A and crash. What happens to strings without a double a that end in b'• 
When the last letter of the input string has been read, we are in state 1. We read the first A 
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and return to state 1, moving the Tape Head farther up the Tape full of A’s. In fact, we loop 
forever in state 1 on the edge labeled (A, A, R). 

All the strings in (a + b)* can be divided into three sets: 

1. Those with a double a. They are accepted by the TM. 

2. Those without aa that end in a. They crash. 

3. Those without aa that end in b. They loop forever. ■ 

Unlike on an FA, on a TM an input string cannot just run out of gas in some middle 

state. Because the input string is just the first part of an infinite Tape, there are always infi¬ 

nitely many A’s to read after the meaningful input has been exhausted. 

These three possibilities exist for every TM, although for the examples we met previ¬ 
ously the third set is empty. This last example is our first TM that can loop forever. 

We have seen that certain PDAs also loop forever on some inputs. In Part II, this was a 
mild curiosity; in Part III, it will be a major headache. 


DEFINITION 

Every Turing machine T over the alphabet X divides the set of input strings into three 
classes: 

1. ACCEPT(T) is the set of all strings leading to a HALT state. This is also called the lan¬ 
guage accepted by T. 

2. REJECT(T) is the set of all strings that crash during execution by moving left from 
cell i or by being in a state that has no exit edge that wants to read the character the 
Tape Head is reading. 

3. LOOP(T) is the set of all other strings, that is, strings that loop forever while running 

on T. ■ 

We shall consider this issue in more detail later. For now, we should simply bear in 
mind the resemblance of this definition to the output-less computer program at the beginning 
of this chapter. 

While we have not yet shown that TMs can recognize all context-free languages, let us 
give some justification for introducing this new mathematical model of a machine by show¬ 
ing that there are some non-context-free languages that TMs can accept. 


EXAMPLE 

Let us consider the non-context-free language {a n b n a n }. This language can be accepted by 
the following interesting procedure: 

Step 1 We presume that we are reading the first letter of what remains on the input. 
Initially, this means we are reading the first letter of the input string, but as the 
algorithm progresses, we may find ourselves back in this step reading the first 
letter of a smaller remainder. If no letters are found (a blank is read), we go to 
HALT. If what we read is an a, we change it to a * or some other marker, even 
A, and move the Tape Head right. If we read anything else, we crash. This is 
all done in state 1. 
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Step 2 In state 2, we skip over the rest of the a’s in the initial clump of a’ s, looking for 
the first b . This will put us in state 3. Here, we search for the last b in the clump 
of b’ s: We read b ’s continually until we encounter the first a (which takes us to 
state 4) and then bounce off that a to the left. If after the b ’s we find a A in- 
stead of an a, we crash. Now that we have located the last b in the clump, we 
do something clever: We change it into an a , and we move on to state 5. The 
reason it took so many TM states to do this simple job is that if we allowed, 
say, state 2 to skip over h’s as well as a’s, it would merrily skip its way to the 
end of the input. We need a separate TM state to keep track of where we are in 
the data. 

Step 3 The first thing we want to do here is find the end of the clump of a’s (this is 
the second clump of a’s in the input). We do this in state 5 by reading right un¬ 
til we get to a A. If we read a b after this second clump of a’s, we crash. If we 
get to the A, we know that the input is, in fact, of the form a*b*a*. When we 
have located the end of this clump, we turn the last two a’s into A’s. Because 
we changed the last b into an a, this is tantamount to killing off a b and an a. 
If we had turned that b into a A, it would have meant A’s in the middle of the 
input string and we would have had trouble telling where the real ends of the 
string were. Instead, we turned a b into an a and then erased two a’s off the 
right end. 

Step 4 We are now in state 8 and we want to return to state l and do this whole thing 
again. Nothing could be easier. We skip over a’s and b’ s, moving the Tape 
Head left until we encounter the rightmost of the *’s that fill the front end of 
the Tape. Then we move one cell to the right and begin again in state 1. 


The TM looks like this: 



Let us trace the action of this machine on the input string aaabbbaaa: 


START 

aaabbbaaa 

3 

*aabbbaaa 

5 

*aabbaaaa 


*aahbbaaa 

3 

*aabbbaaa 

5 

*aabbaaaaL 


*aabbbaaa 

4 

*aabbbaaa 

6 

*aabbaaaa 


*aabbbaaa 

5 

*aabbaaaa 

7 

*aabbaaa 


■ 




*aabbbaaa 

5 ;l 

*aabbaaaa 

8 I 

*aabbaa 
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After designing the machine and following the trace, we should be aware of several things: 

1. The only words accepted are of the form a n b n a n (here, n = 0, 1,2, 3, . . .) 

2. When the machine halts, the Tape will hold as many *’s as there were b’s in the input. 

3. If the input was a m b m a n \ the Tape Head will be in cell (m + 2) when the machine halts. 


THE SUBPROGRAM INSERT 

Sometimes in the running of a Turing machine, we may wish to insert a character into the 
string on the Tape exactly at the spot where the Tape Head is pointing. This means that the 
newly inserted character will occupy this cell and every character on the Tape to the right of 
it will be shifted one cell farther up the Tape. The data on the Tape to the left of the insertion 
point will be left alone. We allow for the possibility that the insertion point is cell i. After 
this insertion takes place, we shall want the Tape Head to point to the cell to the right of the 
inserted character. 

The part of the TM program that can affect such an insertion need not depend on what¬ 
ever else the TM is doing. It is an independent subprogram, and once it is written, we can in¬ 
corporate it into any other TM program by indicating that we are calling upon the insertion 
subprogram and specifying what character we wish to insert. We can insert an a by drawing 
the picture 


or a b or # by the pictures 
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b\x\b\a\b\b \x\b\k . 




Now let us write a piece of TM program to insert a b into a Tape on which the existing char 
acters are all a' s, /?’s, and X’s followed, of course, by infinitely many blanks. The first thing 
we shall have the program do is insert a Q as a marker in the cell into which we are going to 
put the b. The reason we do not just write a b into this cell immediately is that the Tape 
Head must move along up the Tape and then return to the proper cell to the right of the in¬ 
sertion cell; it must be able to locate this spot. 

Let us call the state in which our subprogram starts state 1. In this state, we read a 
character (either a , b, or X) and then we write a Q and move the Tape Head to the right. I 
this next cell, we have to write exactly what it was that was displaced in the previous cell; 
This requires some memory. The memory we use will be in the form of keeping separate 
states that remember the displaced character. Let state 2 remember that what was just dis¬ 
placed was an a. Let state 3 remember that what was just displaced was a b. Let state 4 re 
member that what was just displaced was an X. In our example, the character set for the 
Tape contained only three possibilities. This is a simplification that makes the diagram we 
shall produce more easily understood. But it will be clear that any finite character set can 
be shifted to the right by the same trick of creating a separate state for every character just 
erased. 

If we are in state 2 and we now read a b , we remember that we must replace the 
that was displaced, so we write an a, but now we realize that we have just displaced a b 
which we owe to the Tape in the next cell. This means that we belong in state 3, which 
serves as just such a memory device. Therefore, we draw an edge from state 2 to state 3 
and label it ( b , a , R). If we are in state 2 and we read an X , we go to state 4 on an edge la¬ 
beled ( X , a, R). In both cases, we have paid our debt of one a to the Tape and created a 
new debt we will pay with the next instruction. If we are in state 2 and we read an a, we 
will return to state 2 on a loop labeled (a, a, R). We have paid the debt of one a but now 
owe another. 

The situation for state 3 is similar. Whatever we read, we write the b that we owe and go 
to the state that remembers what character was sacrificed for the b. We have an edge to state 
2 labeled (a, b , R), an edge to state 4 labeled ( X , b, /?), and a loop back to state 3 labeled 
{b, b, R). Also from state 4 we have an edge to state 2 labeled (a, X , /?), an edge to state 3 la¬ 
beled ( b , X, R), and a loop labeled (X, X, R). 

Eventually from state 2, 3, or 4, we will run out of characters and meet a A. When this 
happens, we go to a new state, state 5, from which we begin the rewinding process of return¬ 
ing the Tape Head to the desired location. On our way to state 5, we must write the last 
character owed to the Tape. This means that the edge from 2 to 5 is labeled (A, a , R). The 
edge from 3 to 5 is labeled (A, b, R). And the edge from 4 to 5 is labeled (A, A, R). 

In state 5, we assume that we are reading another A because the character string has 
ended. This A we leave alone and move the Tape Head down to the left and go to state 6. 
State 6 moves the Tape Head over to the left in search of the Q , looping and not changing 
what it reads. When it does reach the inevitable Q (which we know exists because we put it 
there ourselves), we move to state 7, replacing the Q with the b that was the character w 
wished to insert in the first place, and move the Tape Head to the right. It is clear that to in¬ 
sert any other character, all we would have to do is to change one component of the label o 
the edge from state 6 to state 7. 

From state 7, we return to the rest of the TM program. The subroutine INSERT h looks 
like this: 


m 


m 
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(i a,a,R) 



Out 


The usefulness of the subprogram INSERT can be seen immediately from the fact that when 
we begin processing an input string, we run the risk of moving the Tape Head off the Tape 
by inadvertently instructing it to move left when it is, in fact, in cell i, thereby causing an 
unanticipated crash. To prevent this, we can always begin all TM processing by inserting a 
brick wall, #, into cell i as the first step of the program. When moving the Tape Head left 
down the Tape, we can always be careful to bounce off of the brick wall if it is encountered. 
The entire input string is then bounded by # on the left and A on the right. 


EXAMPLE 

Let us consider a TM to accept the language EQUAL, of all strings with the same number of 
a’s and b’ s. EQUAL is context-free but nonregular, and so the algorithm of Theorem 46 
(p. 445) cannot be employed. 

The algorithm we do propose (although it is by no means the best) is to run an alternat¬ 
ing series of search and destroy missions. We will start by inserting a # into cell i. Then from 
cell ii on up we seek an a. When we find our first, we change it into an X and return the Tape 
Head to cell ii. Then we search up the Tape for a b. When we find the first, we change it into 
an X and return the Tape Head to cell ii. We then go back and search for an a again, and so 
forth. The process will stop when we look for an a but do not find any by the time we reach 
A, We then scan down the Tape to be sure that all the cells contain X’s and there are no un¬ 
matched b’ s left. When we encounter # on this pass, we can accept the input. 

The machine we built is on the next page. 
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Let us follow the operation on baab starting in state 6. Starting in state 6 means that we 
have already inserted a # to the left of the input on the Tape. 



6 


6 


7 


7 


8 a 


ttbaab 

—> 

ttbaab 

-> 

ttbXab 

-> 

tthXab 

-> 

tthXab X-73 


9 


6 


6 


6 


7 -3 

-> 

ttXXab 

-> 

ttXXab 

-> 

ttXXab 

— > 

ttXXab 

—» 

ttXXXb ' t 


7 


7 


8 


8 


8 ;':3 

—► 

ttxxxb 

-> 

#XXXb 

-> 

ttXXXb 

—> 

ttXXXb 

-> 

ttxxxb : 
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9 


9 


9 


9 :% 

-> 

ttXXXb 


ttxxxx 

-» 

ttxxxx 

-> 

ttxxxx 


rnxxx ^ 


6 


6 


6 


6 


6 

-> 

ttxxxx 

-> 

ttxxxx 

—► 

ttxxxx 


ttxxxx 

—> 

ttxxxx A • 1 


10 


10 


10 


10 


10 ‘J 

-> 

ttXXXXA 


ttxxxx 

-> 

ttxxxx 


ttxxxx 


#XXXX -> HALT 


Notice that even after we have turned all a’s and b’s into X’s, we still have many steps 
left to check that there are no more non-X characters left. ■ 


# THE SUBPROGRAM DELETE 

For our last example, we shall build a TM subprogram that deletes; that is, it erases the con¬ 
tents of the cell the Tape Head is initially pointing to, moving the contents of each of the 
nonempty cells to its right down one cell to the left to close up the gap and leaving the Tape 
Head positioned one cell past where it was at the start. For example, 



E 

R I E N D 

A . . . 


0 

DELETE 


. . . F 

1 E N i 

DA... 


0 


Just as with INSERT, the exact program of DELETE depends on the alphabet of letters .^jj 
found on the Tape. 

Let us suppose the characters on the Tape are from the alphabet [a b c}. The sub- 
program to DELETE that is analogous to INSERT is V5|| 
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(b,b,R) 



What we have done here is (1) erased the target cell, (2) moved to the right end of the 
non-A data, and (3) worked our way back down the Tape, running the inverse of INSERT. 
We could just as easily have done the job on one pass up the Tape, but then the Tape Head 
would have been left at the end of the data and we would have lost our place; there would be 
no memory of where the deleted character used to be. The way we have written it, the Tape 
Head is left in the cell immediately after the deletion cell. 

Notice that although INSERT required us to specify what character is to be inserted, 
DELETE makes no such demand—it kills whatever it finds. 


EXAMPLE 

We can use the subprogram DELETE to accept the language EQUAL by the following (also 
wasteful) algorithm. First, INSERT # into cell i. As before, find the first a and delete it and 
return the Tape Head to cell i. Now find the first b and delete it. Repeat this process until the 
hunt for the a is unsuccessful, that is, the Tape Head does not cat ch an a here. It finds a A 
first. Now move one cell to the left, and if what is read is the #, the string is accepted; other¬ 
wise, what will be found are excess b' s. If the input had excess a’ s, the program would crash 
in the hunt for the matching b. ■ 
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PROBLEMS 


For Problems 1 and 2, consider the following TM: 



Igr--- 

wm 


1. Trace the execution chains of the following input strings on this machine: 

(i) aaa 

(ii) aba 

(iii) baaba 

(iv) ababb 

2. The language accepted by this TM is all words with an odd number of letters that have a 
as the middle letter. Show that this is true by explaining the algorithm the machine uses 
and the meaning of each state. Pay attention to the two necessary parts that must always 
be demonstrated: 

(i) Anything that has an a in the middle will get to HALT. 

(ii) Anything that gets to HALT has an a in the middle. 

3. (i) Build a TM that accepts the language of all words that contain the substring bbb. 

(ii) Build a TM that accepts the language of all words that do not contain the substring bbb, 

4. Build a TM that accepts the language ODDPALINDROME. 

5. Build a TM that accepts all strings with more a 's than b' s, the language MOREA. 

6. (i) Build a TM that accepts the language [a n b n + 1 }. 

(ii) Build a TM that accepts the language \a n b 2n ). 

7. (i) Show that the TM given in this chapter for the language PALINDROME has more 

states than it needs by coalescing states 4 and 7. 

(ii) Show that the TM given in this chapter for the language {a n b n } can be drawn with 
one fewer state. 


Problems 


455 


Problems 8 through 10 refer to the following TM. We assume that the input string is put on 
the Tape with the symbol # inserted in front of it in cell i. For example, the input ha will be 
run with the Tape initially in the form #baA .... In this chapter, we saw how to do this 
using TM states. Here, consider it already done. The TM is then 



8. Trace the execution chains of the following input strings on this machine: 

(i) aa 

(ii) aaa 

(iii) aaaa 

(iv) aabaab 

(v) abab 

9. The language this TM accepts is DOUBLEWORD, the set of all words of the form ss, 
where s is a nonnull string in (a + b)* (see p. 200). 

(i) Explain the meaning of each state and prove that all words in DOUBLEWORD are 
accepted by this TM. 

(ii) Show that all words not in DOUBLEWORD are rejected by this machine. 

10. (i) Show that states 11 and 12 can be combined without changing the language. 

(ii) What other changes can be made? 
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11. An alternate TM to accept EVEN-EVEN can be based on the algorithm: 

1. Move up the string, changing o’s to A ’s. 

2. Move down the string, changing b's to B ’s. 

We can modify this algorithm in the following way: To avoid the problem of crashing on 
the way down the Tape, change the letter in the first cell to X if it is an a and to Y if it is 
a b. This way, while charging down the Tape, we can recognize when we are in cell i. 
Draw this TM. 

12. Follow the up-down method for a TM that recognizes EVEN-EVEN as explained in 
Problem 11 but use INSERT, not the X, Y trick, to build the TM. 

13. Build a TM that accepts the language EVEN-EVEN based on the subroutine DELETE 
given in this chapter. 

14. In the subroutine INSERT given in this chapter, is it necessary to separate states 6 and 7 
or can they somehow be combined? 

15. On the TM given in this chapter for the language { a n b n a n }, trace the following words: 

(i) aabbaa 

(ii) aabbaaa 

(iii) aabaa 

(iv) aabbaabb 

(v) Characterize the nature of the different input strings that crash in each of the eight 
states. 

16. Build a TM to accept the language {a n b n a n \ based on the following algorithm: 

(i) Check that the input is in the form a*b*a*. 

(ii) Use DELETE in an intelligent way. 

17. Trace the subroutine DELETE in the following situations: 


18. Draw a TM that does the same job as DELETE, but leaves the Tape Head pointing to 
the first blank cell. One way to do this is by reading a letter, putting it into the cell be¬ 
hind it, and moving two cells up the Tape. 

19. (i) Draw a TM that loops forever on all words ending in a and crashes on all others. 

(n) Draw a TM that loops forever on the input string bab, leaving the Tape different 

each time through the loop. 

20. Draw a TM that accepts the language PALINDROME', the complement of PALIN¬ 
DROME. This is, although we did not prove so, a non-context-free language. 


CHAPTER 20 


Post Machines 


THE POST MACHINE 

We have used the word “algorithm” many times in this book. We have tried to explain what an 
algorithm is by saying that it is a procedure with instructions so carefully detailed that no further 
information is necessary. The person/machine executing the algorithm should know how to han¬ 
dle any situation that may possibly arise. Without the need for applying any extra intelligence, it 
should be possible to complete the project. Not only that, but before even beginning we should 
be able, just by looking at the algorithm and the data, to predict an upper limit on the number of 
steps the entire process will take. This is the guarantee that the procedure is finite. 

All this sounds fine, but it still does not really specify what an algorithm is. This is an 
unsatisfactory definition, because we have no precise idea of what a “procedure” is. Essen¬ 
tially, we have merely hidden one unknown word behind another. Intuitively, we know that 
arithmetic operations are perfectly acceptable steps in an algorithm, but what else is? In sev¬ 
eral algorithms, we have allowed ourselves the operation of painting things blue without 
specifying what shade or how many coats. An algorithm, it seems, can be made of almost 
anything. 

The question of determining the appropriate components for mathematical algorithms 
was of great interest earlier in this century. People were discovering that surprisingly few ba¬ 
sic operations were sufficient to perform many sophisticated tasks, just as shifting and 
adding are basic operations that can be used to replace hard-wired multiplication in a com¬ 
puter. The hope was to find a small set of basic operations and a machine that could perform 
them all, a kind of “universal algorithm machine,” because it could then run any algorithm. 
The mathematical model itself would provide a precise definition of the concept of algo¬ 
rithm. We could use it to discuss in a meaningful way the possibility of finding algorithms 
for all mathematical problems. There may even be some way to make it program itself to 
find its own algorithms so that we need never work on mathematics again. 

In 1936, the same fruitful year Turing introduced the Turing machine, Emil Leon Post 
(1897-1954) created the Post machine, which he hoped would prove to be the “universal al¬ 
gorithm machine” sought after. One condition that must be satisfied by such a “universal al¬ 
gorithm machine” (we retain the quotation marks around this phrase for now because we 
cannot understand it in a deeper sense until later) is that any language which can be pre- 
- cisely defined by humans (using English, pictures, or hand signals) should be accepted (or 
recognized) by some version of this machine. This would make it more powerful than an FA 
or a PDA. There are nonregular languages and non-context-free languages, but there should 
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11. An alternate TM to accept EVEN-EVEN can be based on the algorithm: 

1. Move up the string, changing a 's to A’s. ; 

2. Move down the string, changing b 's to B' s. 

We can modify this algorithm in the following way: To avoid the problem of crashing o 
the way down the Tape, change the letter in the first cell to X if it is an a and to Y if it] 
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18. Draw a TM that does the same job as DELETE, but leaves the Tape Head pointing to 
the first blank cell. One way to do this is by reading a letter, putting it into the cell be¬ 
hind it, and moving two cells up the Tape. 

19. (i) Draw a TM that loops forever on all words ending in a and crashes on all others. 

(n) Draw a TM that loops forever on the input string bab, leaving the Tape different 
each time through the loop. 

20. Draw a TM that accepts the language PALINDROME', the complement of PALIN¬ 
DROME. This is, although we did not prove so, a non-context-free language. 
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We have used the word “algorithm” many times in this book. We have tried to explain what an 
algorithm is by saying that it is a procedure with instructions so carefully detailed that no further 
information is necessary. The person/machine executing the algorithm should know how to han¬ 
dle any situation that may possibly arise. Without the need for applying any extra intelligence, it 
should be possible to complete the project. Not only that, but before even beginning we should 
be able, just by looking at the algorithm and the data, to predict an upper limit on the number of 
steps the entire process will take. This is the guarantee that the procedure is finite. 

All this sounds fine, but it still does not really specify what an algorithm is. This is an 
unsatisfactory definition, because we have no precise idea of what a “procedure” is. Essen¬ 
tially, we have merely hidden one unknown word behind another. Intuitively, we know that 
arithmetic operations are perfectly acceptable steps in an algorithm, but what else is? In sev¬ 
eral algorithms, we have allowed ourselves the operation of painting things blue without 
specifying what shade or how many coats. An algorithm, it seems, can be made of almost 
anything. 

The question of determining the appropriate components for mathematical algorithms 
was of great interest earlier in this century. People were discovering that surprisingly few ba¬ 
sic operations were sufficient to perform many sophisticated tasks, just as shifting and 
adding are basic operations that can be used to replace hard-wired multiplication in a com¬ 
puter. The hope was to find a small set of basic operations and a machine that could perform 
them all, a kind of “universal algorithm machine,” because it could then run any algorithm. 
The mathematical model itself would provide a precise definition of the concept of algo¬ 
rithm. We could use it to discuss in a meaningful way the possibility of finding algorithms 
for all mathematical problems. There may even be some way to make it program itself to 
find its own algorithms so that we need never work on mathematics again. 

In 1936, the same fruitful year Turing introduced the Turing machine, Emil Leon Post 
(1897-1954) created the Post machine, which he hoped would prove to be the “universal al¬ 
gorithm machine” sought after. One condition that must be satisfied by such a “universal al¬ 
gorithm machine” (we retain the quotation marks around this phrase for now because we 
cannot understand it in a deeper sense until later) is that any language which can be pre¬ 
cisely defined by humans (using English, pictures, or hand signals) should be accepted (or 
recognized) by some version of this machine. This would make it more powerful than an FA 
or a PDA. There are nonregular languages and non-context-free languages, but there should 
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not be any non-Turing or non-Post languages. In this part of the book, we shall see to wh 
extent Post and Turing succeeded in achieving their goals. 


DEFINITION 

A Post machine, denoted PM, is a collection of five things: 

1. The alphabet X of input letters plus the special symbol #. We generally use X = {a 

2. A linear storage location (a place where a string of symbols is kept) called the STOR 
or QUEUE, which initially contains the input string. This location can be read, 
which we mean the leftmost character can be removed for inspection. The STORE c 
also be added to, which means a new character can be concatenated onto the right 
whatever is there already. We allow for the possibility that characters not in X can 
used in the STORE, characters from an alphabet T called the store alphabet. 

3. READ states, for example, 



t: 


which remove the leftmost character from the STORE and branch accordingly. The only 
branching in the machine takes place at the READ states. There may be a branch for 
every character in X or F. Note the A branch that means that an empty STORE was 
read. PMs are deterministic, so no two edges from the READ have the same label. 

4. ADD states: 


11 
■ 1 

■ 





which concatenate a character onto the right end of the string in the STORE. This is dif¬ 
ferent from PDA PUSH states, which concatenate characters onto the left. Post ma¬ 
chines have no PUSH states. No branching can take place at an ADD state. It is possible 
to have an ADD state for every letter in X and T. 

5. A START state (unenterable) and some halt states called ACCEPT and REJECT: 


(111 






START 


ACCEPT 


REJECT 


The Post Machine 


If we are in a READ state and there is no labeled edge for the character we have 
read, then we crash, which is equivalent to taking a labeled edge into a REJECT state. 
We can draw our PMs with or without REJECT states. ■ 

The STORE is a first-in first-out (FIFO) stack in contradistinction to a PUSHDOWN or 
last-in first-out (LIFO) STACK. The contents of an originally empty STORE after the opera¬ 
tions 



is the string 


If we then read the STORE, we take the a branch and the STORE will be reduced to bb. 
A Post machine does not have a separate INPUT TAPE unit. In processing a string, we 
assume that the string was initially loaded into the STORE and we begin executing the pro¬ 
gram from the START state on. If we wind up in an ACCEPT state, we accept the input 
string. If not, not. At the moment we accept the input string, the STORE could contain any¬ 
thing. It does not have to be empty, nor need it contain the original input string. 

As usual, we shall say that the language defined (or accepted) by a Post machine is the 
set of strings that it accepts. A Post machine is yet another language-recognizer or-acceptor. 
As we have defined them. Post machines are deterministic, that is, for every input string 
there is only one path through the machine; we have no alternatives at any stage. We could 
also define a nondeterministic Post machine, NPM. This would allow for more than one 
edge with the same label to come from a READ state. It is a theorem that, in their strength as 
language-acceptors, NPM = PM. This we shall discuss in Chapter 22. 

Let us study an example of a PM. 

EXAMPLE 

Consider the PM below: 



As required by our definition, this machine is deterministic. We have not drawn the 
edges that lead to REJECT states, but instead we allow the path to crash in the READ state if 
there is no place for it to go. 

Let us trace the processing of the input aaabbb on this PM: 
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STATE 

STORE 

START 

aaabbb 

ADD# 

aaabbb# (Note this point.) 

READ, 

aabbb# 

read 2 

abbW 

ADD a 

abbb#a 

read 2 

bbb#a 

ADD a 

bbb#aa 

read 2 

bb#aa 

read 3 

b#aa 

ADD b 

b#aab 

read 3 

#aab 

ADD b 

#aabb 

READ, 

aabb 

ADD# 

aabb # (Note this point.) 

READ, 

abb # 

read 2 

bb # 

ADD a 

bb#a 

READ, 

b#a 

read 3 

#a 

ADD b 

#ab 

READ, 

ab 

ADD# 

ab# (Note this point.) 

READ, 

b# 

read 2 

# 

read 3 

A 

ADD# 

# (Note this point.) 

READ, 

A 

ACCEPT 



m 


i 


is 


F 

Ip# 

f 


The trace makes clear to us what happens. The # is used as an end-of-input string signal 
(or flag). In READ,, we check to see whether we are out of input; that is, are we reading the 
end-of-input signal #? If so, we accept the string. If we read a b, the string crashes. So, noth¬ 
ing starting with a b is accepted. If the string starts with an a , this letter is consumed by 
READ,; that is, the trip from READ, to READ 2 costs one a that is not replaced. The loop at 
READ 2 puts the rest of the a’s from the front cluster of a’s behind the #. The first b read is 
consumed in the trip from READ, to READ 3 . At READ 3 , the rest of the first cluster of b y s is 
stripped off the front and appended onto the back, behind the a’s that are behind the #. 


— 
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After the b' s have been transported, we expect to read the character #. If we read an a , 
we crash. To survive the trip back from READ 3 to ADD #, the input string must have been 
originally of the form a*b*. 

In each pass through the large circuit READj-READ-^-READj-READ,, the string loses 
an a and a b. Note the markers we have indicated along the side. To be accepted, both a’s 
and b' s must run out at the same time, since if there were more a’s than b's, the input string 
would crash at READ 2 by reading a # instead of b, and if the input string had more b' s than 
a’s, it would crash in state READ, by reading a b. 

Therefore, the language accepted by this PM is (a n b n ) (in this case, including A). ■ 

Post machines look considerably like PDAs, and, in fact, PDAs can accept the language 
{ a n b n \ as the preceding PM (p. 459) does. However, we have seen that { a n b n a n } is non-con¬ 
text-free and cannot be accepted by a PDA. So, to show that PMs have some extra power be¬ 
yond PDAs, we demonstrate one that accepts this language. 

EXAMPLE 

Consider the PM below: 



This machine is very much like the PM in the previous example. We start with a string 
in the STORE. We add a # to the back of it. We accept it in state READ, if the string was ini¬ 
tially empty. If it starts with a b, we crash. If it starts with an a, we use up this letter getting 
to READ 2 . Here, we put the entire initial clump of a’s (all the way up to the first b) behind 
the #. We read the first b and use it getting to READ 3 . Here, we put the rest of the clump of 
b y s behind the a’s behind the #. We had then better read another a to get to READ 4 . In 
READ 4 , a bunch of a’s (minus the one it costs to get there) are placed in the store on the 
right, behind the b y s that are behind the a’s that are behind the #. After we exhaust these a’s, 
we had better find a # or we crash. After reading the # off the front of the STORE, we re¬ 
place it at the back of the STORE in the state ADD #. To make this return to ADD #, the in¬ 
put string must originally have been of the form a*b*a*. Every time through this loop we 
use up one a from the first clump, one b from the b clump, and one a from the last clump. 

The only way we ever get to ACCEPT is to finish some number of loops and find the 
STORE empty, because after ADD # we want to read # in state READ,. This means that 
the three clumps are all depleted at the same time, which means that they must have had the 
same number of letters in them initially. This means that the only words accepted by this PM 
are those of the form { a n b n a n }. ■ 

We should not think that we have proven that PMs accept a larger class of languages 
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than PDAs. We have only demonstrated that PMs accept some context-free languages an 
some non-context-free languages. In Chapter 22, we shall show that PMs do, in fact, accep 
all CFLs. We shall then have to face the question, “Do they accept all none-CFLs?” This 
will be answered in Chapter 24. 

Before we relate PMs to PDAs, we shall compare them to TMs, as Post himself did with 
the following three theorems. 


SIMULATING A PM ON A TM 


THEOREM 47 


is® 

fcv.-' 


Any language that can be accepted by a PM can be accepted by some TM. 


PROOF 




mm 


As with many theorems before, we prove this one by constructive algorithm. In this case, we 
show how to convert any PM into a TM, so that if we have a PM to accept some language, 
we can see how to build a TM that will process all input strings exactly the same way as the 
PM, leading to HALT only when the PM would lead to ACCEPT. 

We know that PMs are made up of certain components, and we shall show how to con 
vert each of these components into corresponding TM components that function the same 
way. We could call this process simulating a PM on a TM. 

The easiest conversion is for the START state, because we do not change it at all. TMs 
also begin all execution at the START state. 

The second easiest conversion is for the ACCEPT state. We shall rename it HALT be¬ 
cause that is what the accepting state is called for TMs. 

The next easiest conversion is for the REJECT states. TMs have no reject states; they 
just crash if no path can be found for the letter read by the Tape Head. So, we simply delete 
the REJECT states. (We often do this for PMs too.) 

Now before we proceed any further, we should address the question of converting the 
PM’s STORE into the TM’s Tape. The STORE contains a string of letters with the possibil 
ity of some occurrences of the symbol #. 

Most often, there will be only one occurrence of the symbol # somewhere in the middle 
of the string, but even though this is usual in practice, it is not demanded by the definition. 

We now describe how we can use the TM Tape to keep track of the STORE. Suppose 
the contents of the STORE look like 

x l x 2 x y )c 4 x 5 

where the x’s are from the PM input alphabet X or the symbol # and none of them is A. We 
want the corresponding contents of the TM Tape to be 


i , 


it 


i 


f; 

m 


I 

ft 


K 



with the Tape Head pointing to one of the x's. Notice that we keep some A’s on the left of 
the STORE information, not just on the right, although there will only be finitely many A's 
on the left, because the Tape ends in that direction. 
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We have drawn the TM Tape picture broken because we do not know exactly where the 
x's will end up on the Tape. The reason for this is that the PM eats up data from the left of 
the STORE and adds on data to the right. If at some point the STORE contains abb and we 
execute the instructions 

READ-ADD a-READ-ADD a -ADD 6-READ 
the TM Tape will change like this: 





The non-A information wanders up to the right, while A’s accumulate on the left. 

Immediately after the START state on the TM, we shall employ the subprogram 
INSERT (from Chapter 19) to insert a A in cell i and to move the whole non-A initial input 
string one cell to the right up the Tape. 

We do this so that the first PM operation simulated is like all the others in that the non-A 
information on the TM Tape has at least one A on each side of it, enabling us to locate the 
rightmost and leftmost ends of the input string by bouncing off A’s. 

There are two operations by which the PM changes the contents of the STORE: ADD and 
READ. Let us now consider how a TM can duplicate the corresponding actions on its Tape. 

If the PM at some point executes the state 



the TM must change its Tape from something like 


A A x x x 2 x 3 x 4 A A 



to 


A A x { x 2 x 3 x A y A A 


0 

To do this, the Tape Head must move to the right end of the non-A characters, locate the 
first A, and change it to y. This can be done as follows: 
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READ 


READ 
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If the Tape is all A’s, the Tape Head reads the cell it is pointing to, which contains a A, and 
moves to the right, “thinking” that it is now in the non-A section of the Tape. It then reads this 
cell and finds another A, which it leaves as a A, and moves right again. The program branches 
along the appropriate edge. Just because the STORE is empty does not mean that the program is 
over. We might yet ADD something and continue. The TM simulation can do the same. 

Thus, we can convert every PM state to a TM state or sequence of states that have 
the same function. The TM so constructed will HALT on all words that the PM sends to 
ACCEPT. It will crash on all words that the PM sends to REJECT (or on which the PM 
crashes), and it will loop forever on those same inputs on which the PM loops forever. ■ 


EXAMPLE 


Recall that our first PM of this chapter was 


Notice that we leave the second state along different edges, depending on which character 
being erased. This is equivalent to the PM instruction 


We should also note that because we were careful to insert a A in cell i in front of the 
put string, we do not have to worry about moving the Tape Head left from cell i and era 
ing while searching for the A on the left side. 

If while processing a given input the STORE ever becomes empty, then the TM T 
will become all A’s. It is possible that the PM may wish to READ an empty STORE i 
branch accordingly. If this alternative is listed in the PM, it should also be in the TM. 
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We have illustrated this in the case where X = {a ft}, but if X had more letters, it w 
only mean more labels on the loop. Notice also that we have left the Tape Head again p 
ing to some non-A character. This is important. We do not want the Tape Head wanderii 
off into the infinitely many blanks on the right. 

There is only one other PM state we have to simulate; that is the READ state. * 
READ states does two things. It removes the first character from the STORE, and 
branches in accordance with what it has removed. The other states we have simulated did 
involve branching. 

For a TM to remove the leftmost non-A character, the Tape Head must move leftw; 
until the first blank it encounters. It should then back up one cell to the right and read 
non-A character in that cell. This it must turn into a A and move itself right, never leav 
the string of non-A’s. This process will require two states in the TM: 


(a,a,R) 
( b,b,R ) 
(#,#,#) 


becomes 










\\ 


CHAPTER 20 Post Machines 


This PM accepts the language { a n b n \. 

This time, we have drawn the machine vertically to facilitate its conversion into a TM. 
Following the algorithm in the proof, we produce the next machine, where, for the sake of 
simplicity, we have omitted the A-inserting preprocessor and assume that the input string is 
placed on the TM Tape starting in cell ii with a A in cell i: 


Notice that 


START J 
(XXR) 


(XX R) 

\ (#.XR) 


TM State 

Corresponds to 

PM State 

START 


START 

1 


ADD# 

2 and 3 


READ, 


ACCEPT 


Simulating a PM on a TM 


467 


5 and 6 

read 2 

7 

ADD a 

8 and 9 

READ 3 

10 

ADD b 


We really should not have put the end-of-proof box on our discussion of Theorem 47 (see 
p. 465) as we did, because the proof is not over until we fully understand exactly how the sepa¬ 
rately simulated components fit together to form a coherent TM. In the preceding example, we 
see that edges between the independently simulated states always have TM labels determined 
from the PM. We can now claim to understand the algorithm of Theorem 47. We are not fin¬ 
ished with this example until we have traced the execution of the TM on at least one input. 

Let us trace the processing of the input string aabb : 


START 
A aabb 
1 

A aabb 
2 

A aabb# 

5 

AA abb# 

7 

AAA bb# 

5 

AAA hb#a 

8 

AAAA b#a 
10 

AAAAA #<a 
8 

AAAAA #ab 
1 

AAAAAA ab± 

3 

AAAAAA ab# 

8 

AAAAAAAA# 

2 

AAAAAAAAA# 


Aaabb 

1 

A aabb& 

2 

Aaabb# 

5 

AA abb# 

7 

AAA bb# 

5 

AAAbb#a 

8 

AAAA b#a 
10 

AAAAA#aA 

9 

AAAAA #ab 
2 

AAAAAA ab# 

5 

AAAAAAA b# 

8 

AAAAAAAA# 

3 

AAAAAAAAA# 


-* Aaabb 

2 

—* Aaabb# 

2 

-> Aaabb# 

6 

—> AA abb# 

7 

-*• A A Abb#A 

5 

-> AAA bb#a 

9 

-*• AAAA b#a 

8 

-> AAAAA #ab 

1 

-> AAAAAA ab 

2 

— AAAAAA ab# 

5 

-*■ AAAAAAA b# 

9 

— AAAAAAAA# 
AAAAAAAAAAA- 


Aaabb 

2 

Aaabb # 

3 

Aaabb # 

7 

AAA bb# 

5 

AAA bb#a 

6 

AAA bb#a 
10 

AAAAA #a 

8 

AAAA A#ab 
1 

AAAAAAak 

2 

AAAAAA ab# 

6 

AAAAAAA b# 

1 

AAAAAAAAAA 


Here, we have decided that the initial A’s from cell i up to the data are significant and 
have included them in the trace. 

We can see from this execution chain that this is a TM that accepts {a n b n }. We already 
know that there are other (smaller) TMs that do the same job. The algorithm never guaran¬ 
teed to find the best TM that accepts the same language, only to prove the existence of one 
such TM by constructive algorithm. ■ 


We should note that the alphabet that appears on the TM Tape produced by this algo¬ 
rithm is the same as the STORE alphabet of the PM. 

In the TM we just constructed we have encountered a situation that plagues many 
TMs—piles of tedious multiple-edge labels that all say about the same thing: 
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This is proper TM format for the instruction, “If we read an a, a b, a A, a #, or a *, leave 
it unchanged and move the Tape Head left.” Let us now introduce a shortened form of this 
sentence: (a, b , A, #,*; = , L ) 

DEFINITION 

If a, b, c y dy e are TM Tape characters, then (a, b, c, d, e\ — , L) stands for the instructions 

(i ay a, L) (b, b, L) . . . ( e , e, L) 

Similarly, we will employ (a, b , c, d, e; = , R ) for the set of labels 

(a, a, R ) (by byR) . . . (e, e, R) 


4 SIMULATING A TM ON A PM 

Before we proceed, it will be useful for us to demonstrate that although a PM is provided 
with only two STORE instructions that seem to correspond to PDA STACK instructions, the 
PM READ and ADD are definitely more flexible. 

THEOREM 48 

There are subprograms that can enable a PM to add a character to the front (left end) of the 
string in the STORE and to read the character off of the back (right end) of the string. 

PROOF | 

To add a character to the front of the STORE (which corresponds to a PDA PUSH instruc¬ 
tion), we need to know the alphabet of characters already in the STORE and then employ a 
new character different from all of them. Let F be the character set in the STORE and $ be a 

character not in T. jj 

Let us say that we wish to add the letter b to the front end of the store. What we will do is 
first ADD $ to the back of the STORE. Then we ADD b to the back of the STORE. And now we *5 
enter a loop in which we READ whatever is at the front of the STORE and, unless it is a $, we 
immediately ADD the very same character to the back of the STORE. This executes a shift-left 
cyclically operation. When we do eventually (or immediately) READ the $, we are done, for the 
next character is the b we meant to concatenate on the front of the STORE, and this b is fol¬ 
lowed by the entire string that used to be in the STORE before the operation began. 

The PM subprogram that does this is 
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As an example, suppose the STORE originally contained pqr. Then the subprogram would 
produce this sequence of STORE changes: 

pqr —> pqr$ pqr$b qr$b —> qr$bp —* r$bp —> r$bpq —* $bpq —> %bpqr —* bpqr 

We will call this subprogram ADD FRONT b. 

In order to write a subprogram that reads the back character from the STORE and 
branches according to whatever it reads, we will first write a program that takes the last char¬ 
acter and puts it in the front of the STORE, leaving the rest of the string unaltered. We can 
then use the regular PM READ instruction to do the branching. So, what we will write is a 
program called SHIFT-RIGHT CYCLICALLY. 

To do this, the basic strategy is to stick a marker (the $ will do again as long as it is not in 
the STORE character set T) onto the back of the STORE string. We then read two characters 
from the left of the store and, by being in an appropriate memory state, we ADD the first charac¬ 
ter to the back of the STORE, provided that the second character is not the $. We still have the 
second character that we have not yet added to the STORE, and we will not do so unless what 
we READ next (the third character) is not the $ either. We keep this third character in mind (in 
memory by virtue of a specialized state) until we have read the fourth and found it is not the $ 
yet. Eventually, we do encounter the $ and we know that the character we are holding in mem¬ 
ory (the character before the $) was originally the last character in the STORE, and we add it on 
to the front of the STORE by the ADD FRONT subprogram we have just produced above: 

This then is the subprogram for SHIFT-RIGHT CYCLICALLY: 


$ (If we come out here, theX STORE was empty.) 


ADD a 

ADD a 

ADD a 

ADD b 

ADD b 

ADD b 

ADDc 


IB 

Go to 

read 2 

Go to 

read 3 

1 

Go to 

read 4 

i 

Go to 

read 2 

Go to 
READ 3 

\ 

Go to 
READ 4 

1 

Go to 
READ 2 

Go to 
READ 3 

Go to 
READ 4 




We have not drawn in the full spaghetti of edges but used the direction go to READ such . We 
have used the old trick of the subprogram INSERT, of remembering what character has been 
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THEOREM 49 


Any language that can be accepted by a TM can be accepted by some PM. 


PROOF 


This proof will again be by constructive algorithm. We start by assuming that we have an ap¬ 
propriate TM for a certain language and from the TM we shall build a PM that operates on 
input strings in exactly the same way, step by step. Again, we shall be doing a simulation. 

Before continuing with this proof, we should note that we intend to use a STORE alpha¬ 
bet that is larger than usual. Normally, we expect the STORE to contain the letters of the al¬ 
phabet from the input-string language plus the symbol #. Here, we are going to put any char¬ 
acter from the TM Tape alphabet (which can be much larger, with many special symbols); 
into the STORE. In particular, the character A may have to be placed in the STORE as well 
as A, B, C, . . . . If there are any who have philosophical qualms about adding A to the 
store as a character, let them not think of it as a blank but as the first letter of Dionysius. The 
simulation will work just as well. The language ultimately accepted by the PM will have ini¬ 
tially only the letters of the input string on the TM, but other characters may be employed in 
the processing, just as with TMs. 

We already have some feel for the correspondence between these two machines from 
Theorem 47 (p. 462). Still, one great problem stands out. In TMs we can read and change a 
character in the middle of the string, whereas with PMs we can only read and add onto the 
ends of the string. How can PMs simulate the action of TMs? A clever trick is needed here 
that makes use of the extra symbol # that PMs have, which we shall assume is not in either 
of the TM’s alphabets, T or X. (If the TM did use this symbol in its Tape alphabet F, then 


read by being in a different READ state for each possibility. Thus, READ 2 remembers that 
the character we owe to the STORE is an a and it will be added to the back unless the next 
character is a $, in which case it will be added to the front. When we ascertain that the next 
character is a c, we ADD a and then go to READ 4 to determine which end of the STORE to 
add the c. 

As we mentioned already, the full subprogram of reading the right end character of the 
STORE, which we call READ BACK, is 


fREAD BACK 


SHIFT-RIGHT CYCLICALLY 


All told, we can read or add to either end of the STORE. 

We are now in position to simulate a full TM on a PM. 

We have shown that any language that can be accepted by a PM can also be accepted by 
some TM; however, that is only half the story. 
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change it to boldface or italics or blue paint without changing the operation of the TM and 
freeing # as a symbol special to the PM.) 

We shall make a correspondence between # and the position of the Tape Head. The 
character string to the left of the Tape Head on the TM Tape will be placed to the right of 
the symbol # on the PM STORE and the character string to the right of (or at) the Tape 
Head will be placed to the left of #. 

By these confusing words, we mean to describe the correspondence of 


i 

ii 

in 

iv 

V 

vi 

vii 

viii 


Ll 

X 2 

*3 

*4 

*5 

*6 

*7 


* 

. . . 


0 


in the TM with 


STORE: X 4 X 5 X 6 X 1 X fi #X l X 2 X 3 

in the PM. 

Why do we do this? Because when the Tape Head is reading cell iv as it is in the TM 
above, it reads the character X 4 . Therefore, we must be set to read X 4 in the PM, which 
means it had better be the leftmost character in the STORE. 

Here comes the beauty of this method of representation. 

Suppose that while the Tape Head is reading cell iv, as above, we execute the instruc¬ 
tion ( X 4> Y , R). This leaves us the TM situation: 


i 

u 

in 

iv 

V 

VI 

vu 

Vlll 

IX 

1 k 1 

x 2 


1 Y 

| *5 

l Xf > 

*7 

*8 

I A 

LL 


D 


To maintain the correspondence, we must be able to convert the STORE in the PM to 

STORE: X 5 X 6 X 7 X 8 #X l X 2 X 3 T 

This conversion can be accomplished by the PM instructions (states): 



The X 4 is stripped off the front and a Y is stuck on the back, a very easy PM operation. No¬ 
tice that both TM and PM are now set to read X 5 . 

Let us pause for a moment to see exactly how this conversion works. On the next page 
on the left is a TM that converts the input word “cat” into the word “dog” and crashes on all 
other inputs. This TM uses only right Tape Head moves, so we can convert it easily to the 
PM on the left using the correspondence shown above: 
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Notice how the correspondence between Tape and STORE is preserved with every instruc¬ 
tion. Let us return to the simulation. 

Suppose instead that we had to simulate a left move; that is, we started with the original 
Tape as earlier, with Tape Head reading cell iv, and we were asked to execute the instruction 
(X 4 , Y, L). This would leave the Tape as 


i ii iii iv v vi vii viii ix 



This Tape status corresponds to the STORE contents 

X 3 YX 5 X 6 X 1 X g #X l X 2 
This is almost equivalent to the sequence 



We say “almost” because we have the problem of what to do when the TM is instructed 
to move left when the Tape Head is at cell i. Consider the Tape situation below: 


i ii iii 



Here, (X,, Y, L) causes a crash. Let us see what this instruction means when performed by 
the PM simulation. 

In our PM version, we would start with the STORE contents 

x,x 2 x 3 # 

We would then execute the sequence READ-ADD FRONT T-SHIFT-RIGHT CYCLI¬ 
CALLY. The contents of the STORE changes as shown below: 



Because we have agreed in our simulation to keep the character that is in the TM cell 
being read by the Tape Head to the left of the # in the PM store, the final STORE contents 
make no sense. It does somewhat “represent” a crash in that it shows that the Tape Head is 
not reading anything, but it does not crash the PM. The PM could conceivably still continue 
processing the input and eventually reach ACCEPT. To be sure the PM stops processing, we 
must include in every PM simulation of a leftward TM move a test to see whether the first 
symbol in the STORE has become #. 
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ADD FRONT a 


ADD FRONT b 


READ 


ADD FRONT c 


REJECT 


After we read a non-# character, we stick it back onto the front of the STORE. 

Now we have a completely accurate treatment for (X, Y, L), but we realize that we haveM 
not fully covered the (X, Y , R) case yet. Another difficulty, similar to the problem we have 
iust treated, arises when we want to move the Tape Head right beyond the non-A’s on the 


and the TM wants to execute the move 


In the PM simulation of this, the STORE begins by containing 


and after READ-ADD it contains 


which is again a meaningless formulation in our correspondence because the SIOKb starts 
with a #. When a move right causes the # to be the first character of the STORE, we should 
insert a A in front of # in the STORE to achieve 

a#x,x 2 t 

which does correspond to the TM’s Tape status. 

We can do this as before with a test after READ-ADD to see whether the STORE starts 
with a #. If it does, instead of crashing, we replace the # and ADD FRONT A. 
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The simulation is almost complete. All branching and Tape modification that the TM re¬ 
quires can be performed by the PM we have designed. In any case, where the TM accepts the 
input string by branching to HALT, let the PM accept the string by branching to ACCEPT. 

To start the PM, we must make it initially resemble the TM. The TM begins its process¬ 
ing by having the input string already on its Tape: 


i ii iii iv v 



' D 

while a PM running on the same input according to the rules of PMs must start with the 
STORE containing exactly the same input string: 

Xj X 2 X 3 X 4 X 5 

However, the STORE contents corresponding to the TM status would be 

x,x 2 x,x t x 5 # 

To begin correspondence, we have to add a # to the right. Therefore, our initial sequence 
in the PM must always be 



In converting a TM into a PM we have the quandry of what to do about a TM START state 
that is reentered. In the PM the in-edges will go into this ADD # instead. Now the correspon¬ 
dence is complete; all words accepted by the TM will be accepted by the PM. All input 
strings that crash on one will crash on the other, and all input strings that loop forever on the 
TM will do the same on the PM. ■ 

This is a very inefficient conversion algorithm, so we shall illustrate it on a very small 

TM. 

EXAMPLE 

Consider this TM: 



(, b,a,L) 


This machine accepts all words starting with an a and, in so doing, it turns the input into a 
string of solid a' s. When converted into a PM by the algorithm above, the resultant ma¬ 
chine is 
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READ, 


READ- 


READ, 


CHAPTER 20 Post Machines 


AF# »—AFA 


ADDA 


Here, we have used the abbreviations AF for ADD FRONT and SRC for Slf 
CYCLICALLY. To understand the equivalence, let us explain the meaning of 
states: 4 111 


READ, acts like the reenterable TM START state. 


READ 2 is a T\PE-HEAD-reading~A checker, as are READ, and READ 6 . 3 





ACCEPT 


problems 4 77 

READ 3 corresponds to TM state 1. 

READ 4 is a crash-while-moving-left checker. ■ 

Taken together, Theorems 47 and 49 tell us that PMs and TMs have the same power. We 
may write 

PM = TM 


OBLEMS 

problems 1 through 4 refer to the following PM: 


1. Trace the paths of the following input strings on this PM. At every step, name the cur¬ 
rent state and the contents of the STORE. 

(i) abab 

(ii) baabba 

(iii) aaabbb 

(iv) aabbbb 

(v) bbabaaa 

2. (i) Show that if an input has exactly one more a than b , it will crash on this PM in 

state READ,. 

(ii) Show that if an input string has exactly one more b than a, it will crash on this PM 
in state READ 3 . 

(iii) Show that if an input string has more than one more a than b or more than one 
more b than a, then it will loop forever on this PM. 

3. Show that the language accepted by this PM is EQUAL, all words with the same num¬ 
ber of a 's and b’s. 

4. Draw a PM that accepts the language UNEQUAL, the complement of EQUAL. 

5. Draw a PM that accepts the language {a n b 2n }. (Hint: Use the subroutine SHIFT-RIGHT 
CYCLICALLY) 
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6. Draw a PM that accepts the language EVENPALINDROME. 

7. (i) Draw a PM that accepts the language ODDPALINDROME. 

(ii) Draw a PM that accepts the language PALINDROME. 

8. Draw a PM that accepts the language EVENPALINDROME' (the complement of 
EVENPALINDROME). 

9. (i) Explain why, even though a PM is deterministic, the complement of a language ac¬ 

cepted by a PM might not be accepted by any PM. 

(ii) Find an example of a PM that does not accept the complementary language by re¬ 
versing ACCEPT and REJECT states. 

(iii) Find a PM that accepts exactly the same language if its ACCEPT and REJECT 
states are reversed. 

10. Prove that all regular languages can be accepted by some PM. (This is not hard. Simply 
follow the line of argument in the proof of Theorem 28, p. 310.) 

11. (i) Convert the following TM into a PM using the algorithm of Theorem 49 (p. 470) 

(make use of the subroutine SHIFT-RIGHT CYCLICALLY): 

(b,b,R) 

( a t a,R ) 

/-\ (a.aJQ (A,A,L) ( b,a,R) 

( start ) - \i ) - . - 

Run the following input strings on both the TM and PM: 

(ii) a 

(iii) ab 

(iv) abb 

(v) What is the language accepted by the two machines? 

(vi) Build a smaller PM that accepts the same language. 

12. (i) Build a PM that takes in any string of a’s and b's and leaves in its STORE the com¬ 

plement string that has the a’s and b "s switched. 

(ii) Build a PM that takes in any string of a 's and b's and exchanges the first and last 
letters and then accepts. 

13. (i) Build a PM that accepts the language MIDDLEA of all words that have an a as the 

middle letter. (These words obviously must have odd length.) 

(ii) Prove that this language is nonregular. 

(iii) Prove that this language is context-free. 

14. Convert the PM built in Problem 13 into a TM by the algorithm in this chapter. 

15. Build a PM that accepts the language MOREA (all words with more a 's than b’ s) by us¬ 
ing the following algorithm: 

Step 1 On one pass through the data, look for a pair of consecutive letters that are un¬ 
equal and cancel them both. 

Step 2 Repeat the operation above until there are no letters to cancel. 

Step 3 If there is an a left, accept the word. 

Run this machine on the following input strings: 


HALT ^ 
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(i) aabb 

(ii) aaabb 

(iii) ababa 

(iv) ababab 

16. Build a PM that takes any input from the language defined by (a + b)* and deletes all 
substrings of the form aaa, leaving all else in the word intact. 

17. Build a PM that sorts the letters of a string. That is, if aba is fed in, the machine leaves 
aab in its STORE and accepts. Also, bbbaba becomes aabbbb. 

18. Build a PM that starts with any string s from (a + b)* and leaves 

^length**) 

This is the language TRAILING-COUNT we have seen before (p. 204). 

19. (i) Outline a TM that takes any input string of a ’s and b’s and runs to HALT, leaving on 

its Tape the same string reversed. 

(ii) Outline a PM that does the same thing. 

20. Let L be a language accepted by the PM P. Let the reverse of L be the language of all 
the words in L spelled backward. Prove that there is some PM, G, that accepts transpose 
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THE TWO-STACK PDA 


We shall soon see that Turing machines are fascinating and worthy of extensive study, but 
they do not seem at first glance like a natural development from the machines that we had 
been studying before. There was a natural extension from FAs to PDAs that made it easy tq 
prove that all regular languages could also be accepted by PDAs. There is no such natural 
connection between PDAs and TMs; that is, a TM is not a souped-up PDA with extra 
gizmos. 

We found that the addition of a PUSHDOWN STACK made a considerable improve¬ 
ment in the power of an FA. What would happen if we added two PUSHDOWN STACKs, or 
three, or seventy? 






DEFINITION M 

A two-pushdown stack machine, a 2PDA, is like a PDA except that it has two PUSH-^j 
DOWN STACKs, STACK, and STACK r When we wish to push a character .* into a 
STACK, we have to specify which stack, either PUSH, * or PUSH 2 *. When we pop a | 
STACK for the purpose of branching, we must specify which STACK, either POP, or POP^Vj 
The function of the START, READ, ACCEPT, and REJECT states remains the same. The in-J 
put string is placed on the same read-only INPUT TAPE. One important difference is that we 
shall insist that a 2PDA be deterministic, that is, branching will only occur at the READ and 
POP states and there will be at most one edge from any state for any given character. • 

Because we have made 2PDAs deterministic, we cannot be certain whether they are 
even as powerful as PDAs; that is, we cannot be certain that they can accept every CFL be¬ 
cause the deterministic PDAs cannot. j 

We shall soon see that 2PDAs are actually stronger than PDAs. They can accept all 
CFLs and some languages that are non-context-free. f\ 
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Consider the 2PDA on the next page: 
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There are many REJECT states that we have not drawn in. As far as we are concerned, it 
is fine for the machine to crash when it reads or pops a character for which there is no path. 
This does not make the machine nondeterministic. 

We have numbered the READ states but not the POPs because they already have nu¬ 
meric labels designating which STACK is to be popped and extra numbers would be confus¬ 
ing. 

The first thing that happens to an input string is that the initial clump of <z’s is stripped 
away and put into STACK, in a circuit involving READ,. Then a b takes us into a circuit in¬ 
volving READ 2 , where we pop an a from STACK, for every b we read from the INPUT 
TAPE. Every time we pass through this circuit, we push a b into STACK r When we are 
done, we check to make sure that STACK, is now empty. If we pass this test, we know that 
there were as many b’s in the b-clump as a 's in the a-clump. We now enter a circuit involv¬ 
ing READ 3 that reads through another clump of a 's from the input and matches them against 
the number of b's we have put into STACK 2 in the previous circuit. If both the INPUT TAPE 
and STACK 2 become empty at the same time, then there were as many a's at the end of the 
TAPE as b’s in STACK 2 . This would mean that the whole initial input string was of the form 
a n b"a\ 

We can check this by processing aabbaa as follows: 


TAPE 

STATE 

STACKj 

STACK 2 

aabbaa 

START 

A 

A 

abbaa 

READ, 

A 

A 

abbaa 

PUSH, a 

a 

A 

bbaa 

READ, 

a 

A 

bbaa 

PUSH, a 

aa 

A- 
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TAPE 


STATE 


STACK 


STACK 


READ, 


PUSH, h 


READ 


PUSH, b 


READ 


READ 


READ 


ACCEPT 


So, we see that a 2PDA can accept one language that a PDA cannot. Are there lan¬ 
guages that a 2PDA cannot accept? Is a 3PDA stronger? Is a nondeterministic 2PDA 
stronger? Which is stronger, a 2PDA or a PM? The subject could, at this point, become very 
confusing. However, many of these questions are settled by a theorem of Marvin Minsky 
(1961). 


JUST ANOTHER TM 


THEOREM 50 


2PDA - TM 


In other words, any language accepted by a 2PDA can be accepted by some TM and any lan 
guage accepted by a TM can be accepted by some 2PDA. 


PROOF 


In the first part of this proof, we shall show that if the language L can be accepted by some 
2PDA, then we can construct a TM that will also accept it. There may be several 2PDAs that 
accept L, so we fix our attention on one of them, call it P. 

This demonstration will, of course, be by constructive algorithm. We shall show how to 
construct a TM that parallels the actions of the 2PDA. (We have also used the words “corre¬ 
sponds,” “simulates,” “duplicates,” and “emulates” and the phrase or “processes exactly the 
same way.” These words are not technically different.) 

The 2PDA has three locations where it stores information: the INPUT TAPE, STACK,, 
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and STACK 2 . The TM we build has only one information storage location, the Tape. There¬ 
fore, we must put on the Tape the information found in all three 2PDA locations. There is 
other information that is carried in the knowledge of what state we are in, but that will corre¬ 
spond easily between the 2PDA and the TM. 

Suppose at some stage in the process the 2PDA has this status: 

TAPE X x X 2 X 3 X 4 

STACK, F, y 2 y 3 y 4 y 5 

stack 2 Z, z 2 

where the X’s, Fs, and Z’s are letters from the input and stack alphabets of the 2PDA. Our 
definition of 2PDAs was sketchy and did not mention whether each STACK had its own al¬ 
phabet or whether there was some other rule. Because a STACK does not have to use all of 
the characters in its STACK alphabet, there is no real difference, so let us assume that the X’s 
are from 2 and the F s and Z’s from F. 

In our setup, we encode these three strings on the TM Tape as follows: 

Step 1 Assume the characters # and $ are not used by the 2PDA (if they are, find other 
special symbols). 

Step 2 In the first section of the TM Tape, we store the input string. Initially, we insert a 
A into cell i, moving the data unchanged up the Tape and later, as the letters of 
input are read by the 2PDA, we change them one by one into A’s on the TM 
Tape. The status of the TM Tape corresponding to the current status of the 2PDA 
TAPE as described above after two input letters are read is 



In what we have pictured above, two letters from the input string, those that were in cell 
ii and cell iii, have been read by the 2PDA and thus converted into A’s on the TM. Because 
the number of letters in the input string cannot be increased (a 2PDA can read its TAPE but 
not write on it), we can put a permanent marker “#” on the TM Tape at the end of the input 
string before we begin running. Throughout our processing, the marker will stay exactly 
where it is. This # will be the home base for the Tape Head. After simulating any action of 
the 2PDA, the TM Tape Head will return to the # before beginning its next operation. 

In our model, the TM instructions that simulate the operation of the 2PDA state: 



State Y 
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must accomplish the following chores: 

1. Move the Tape Head to the left to find the rightmost of the front A’s. 

2. Bounce back to the right to find the next input letter to be read; in other words, scan 
right for the first non-A. 

3. If this character is #, the input has been exhausted and we go to state Z otherwise. 

4. Change this letter into a A and back up one space to the left (so that we do not acciden¬ 
tally step on the # without knowing it). 

5. Branch according to what was read; if it was an a, take an edge to the simulation of 
state X, if a b, take an edge to state Y. 

6 . Before continuing the processing, return to the Tape Head to # by moving right until it 
is encountered. 

In TM notation, this looks like this: 

(o.fr.#: =.L) 





State Vsimulation 


jjjj 


16 

mm 

gja 

<• 

life 

l 

lie 

life 
:' 

lip: 

is 

r 

i 

■git 

Ijfey 


Notice that we are making use of the multiple instruction notation defined in Chapter 20 
on p. 468. 

(/?, q, r, s ; = , R) stands for {p, p , R ), (<?, q, /?), (r, r, R), (s, s, R) 

In state 1, we are looking for the A’s at the beginning of the Tape. We get to state 2 
when we have found one and bounced off to the right, either onto the first letter of the re¬ 
mainder of the string or else back onto #. If the string was empty when we got to read it, we 
follow the edge from state 2 to state 5. The edge from state 5 bounces us off the A that is to 
the left of # and leaves the Tape Head reading the # as we want. 

The reason we make such a fuss about knowing where we leave the Tape Head is not 
because it matters in the simulation of any particular step, but because it helps us glue to¬ 
gether the simulated steps. This is somewhat like building a house and returning the ham¬ 
mer to the tool shed after driving in each nail. It is not efficient, but we never lose the 
hammer. 


3f’ 
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Step 3 The contents of the two PUSHDOWN STACKS will appear to the right of the # 
on the TM Tape. We place the contents of STACK, on the Tape, then the $ 
marker, and then the contents of STACK,. The Tape would then look like this: 


A x, x 2 x 3 x 4 # r, 


YA $ Z, Z 2 A . . . 


To simulate a POP, instruction, we move the Tape Head from the # one cell to the right, 
branch on what we read, and return the Tape Head to the same cell it just read, and along 
each branch we run the TM subprogram DELETE. If we deleted first, we would not remem¬ 
ber what the character used to be. After simulating the POP,, we return the Tape Head safely 
to point to the # again. The PM state: 



becomes 


[a,a,L) 

^ (#,#,*) 

{any,=,L) 

C tf _. 

(b,b,L) . 

> ULI 

(any,=,£) 

5 V 

C c,c,L) 

? ULI 

(#,#,£) I 

1 (any,=,L) 

- 

TDELETE -^ 


Here, we have used the label (any, =, L) to mean “whatever is read, write the same thing,” 
and move the Tape Head to the left. We should also note that popping an empty STACK, is 
the same as reading the $ right after the #. 

To simulate a PUSH, X, we move the Tape Head one cell to the right and run the TM 
subprogram INSERT X. We then return the Tape Head to point to # by moving two cells to 
the left: 


(any,=,L) (any,=,L) 


To simulate a POP 2 , we advance the Tape Head up the Tape to the cell one past the $. 
This we read and branch and return to and delete, as with POP,. Again, we return the Tape 
Head to the #-cell. 



^ (#, #, R) 


PUSH} X 

—► becomes —>-6 j --—>• 

INSERT X - 


1 
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START 


READ 


PUSH 


PUSH? b 


PUSH 


PUSH, b 


PUSH?* 


INSERT X] 


PUSH ,X 


PUSH, b 


Here, we first empty STACK, into STACK 2 (in STACK 2 the contents appear backward), 
then we insert the character X in STACK,, and then we read back the string from STACK 2 
into STACK, (it is back in correct order now). The net result is that we have an additional X 
on the right of the string in STACK,, which means at the bottom of the stack. 


When the 2PDA branches to an ACCEPT state, we enter the TM HALT state and accept the 
input string. 

The individual parts fit together perfectly because each component finds the Tape Head 
pointing to # and leaves it in the same place. 

End of steps. 

So far, we have proven only half of Minsky’s theorem. We have shown that TMs can do 
everything 2PDAs can do. We still have to show that any language accepted by a TM can be 
accepted by some 2PDA. 

To make the proof of this section easier, we shall prove that any language accepted by a 


The label (any non-$, = , R) means that we move the Tape Head right without changing 
the contents of the Tape, and we stay in the same state until we read the $. The label (any 
non-#, =, L) has an analogous meaning. It takes half the subprogram to return the Tape 
Head. 

To simulate a PUSH 2 X, we advance the Tape Head one cell past the $ and run the TM 
subprogram INSERT X. We then return the Tape Head to its usual position. 


becomes 


PM can be accepted by some 2PDA. By Theorem 49 (p. 470), this implies that 2PDAs can 
do anything TMs can do and so it is enough to prove our result. 

These two machines are already considerably closer to each other than TMs and 
2PDAs, because both 2PDAs and PMs operate on the ends of storage locations with in¬ 
structions inside states. In TMs, the instructions are on the edges; a Tape is much more 
complex to access, because we can read and write in its middle. We shall show how 
STACK, (on the 2PDA) can act in as versatile a manner as the STORE (on the PM) with 
the help of her brother STACK 2 . 

The PM starts with the input string already in the STORE, so we must transfer the input 
string from the Tape of the 2PDA into STACK,. We do this as follows: 


We took the letters from the TAPE and put them first in STACK 2 . But because of the na¬ 
ture of a PUSHDOWN STACK, the string was reversed. If the input was initially aabb , what 
can be read from STACK 2 is bbaa. When it is transferred again to STACK,, the input string 
is reversed once more to become aabb as it was on the TAPE so that POP, now has an a as 
the first letter. The TAPE is now empty, and so we never refer to it again. 

The two states with which a PM operates on its STORE are READ and ADD. The 
READ is a branch instruction and completely corresponds to the 2PDA instruction POP, by 
eliminating the leftmost character and branching accordingly. 

The ADD instruction is not so directly correspondent to any 2PDA instruction, because 
PUSH, introduces a new character on the left of the string in STACK,, whereas ADD intro¬ 
duces a new character on the right of the string in the PM’s STORE. 

We can, however, simulate the action of ADD X with the following set of 2PDA instruc¬ 
tions: 


(any non-$,=,.R) 


(any non-#,=,L) 

A, <#.#,«> (any,=,L) 


becomes 


(any non-$,=,i?) 

A 

($,$,«) 


(any non-#,=,L) 


(any non-#,=,L) 


{ ($,$,» 

rvcri trTET 


)-^ 

Ut.Lt 1 t 

> \ 


(any non-#,=,L) 

A 

(any non-#,=,L) 


v ($,$,«) 

DELETE 



, (#,#,«) . 

)— 



1 > \ 



, ($,$,i2) 




LJ 

I— -► 

DELETE 

1 



(any,=,L) 


PUSH? a 
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a A 


A 

b 


X b 


a A 

b 



b 


b 

-* 

b 

b 



a 


a 


b 








X 

STACK, STACK, 

STACK, 

STACK, 

STACK, STACK, 

STACK, STACK, 








STACK 2 is used only to initialize STACK, and to simulate the ADD instruction and for 
no other purpose. 

The only other states a PM has are REJECT and ACCEPT, and those stay completely 
the same in the 2PDA. Therefore, we have finished describing this conversion process. We 
can completely simulate a PM on a 2PDA. Because we can simulate a TM on a PM, we can 
conclude that we can simulate a TM on a 2PDA. 

This completes the proof of Minsky’s theorem. ■ 

To illustrate the action of the algorithms in the proof, we shall now present the manda¬ 
tory examples of a 2PDA converted into a TM and a PM converted into a 2PDA. In both 
cases, the conversion does not change the language accepted by the machine. 










EXAMPLE 

No higher purpose would be served by constructing a 3000-state TM corresponding to a 
complicated 2PDA, so we choose a very simple 2PDA and claim that it is pedagogically suf¬ 
ficient. 

One of the simplest 2PDAs is shown below: 


itll 








This machine accepts all words beginning with a and crashes on all words beginning 
with b because POP 2 cannot produce an a. 

Many simple TMs can accept this language, but to know this, we must understand the 
language. If we automatically follow the algorithm described in the proof of Theorem 50, we 
then produce a TM that must accept the same language as this 2PDA whether we know how 
to characterize the language by some simple English sentence or not. That is the whole point 
of “proof by constructive algorithm.” 

The TM we must build is shown below: 


[ * 

1 


Just Another TM 


{a,b;-,R) 


la,b,#;=,L) 



(any non-$,=,R) 



) c ( 

c 

)< c 

1 tMorn’T 7 


f ( a,a,L ) V. 

y ($,$,#) v 

y (any,=,L) V. 

— / (anv.=.L) . 


The pleasure of running strings on this machine is reserved for Problem 16. 

EXAMPLE 

Consider the following PM: 


ADD b 
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IPS 


In the problem section of the last chapter, this was seen to accept the language EQUAL, 
of all strings with the same total number of a 's and b’s (cf. p. 477). 

When we convert this into a 2PDA by the algorithm described in the proof of Minsky’s 
theorem, we obtain the following: 


START 


PUSH 


READ 


PUSH? b 


PUSHi a 


PUSH, b 


PUSH 


PUSH? a 


ACCEPT 


PUSH?6 


PUSH?6 


PUSH 


PUSH, a 


PUSH 


PUSH! b 


PUSH !b 


Tracing words through this machine is left to the Problems section. 


N. 

A 

PUSH? b 
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THEOREM 51 

Any language accepted by a PDA with n STACKs (where n is 2 or more), called an «PDA, 
can also be accepted by some TM. In power we have 

«PDA = TM if«>2 


PROOF 

We shall sketch very quickly how the action of a 3PDA can be simulated by a TM as an il¬ 
lustration of the general idea. 

Suppose that we have a 3 PDA that is running on a certain input string. In the middle of 
the process, we have some information on the INPUT TAPE and in the STACKs. Suppose 
the status is 

TAPE w, w 2 w 3 w 4 

STACKj x, x 2 

stack 2 y l y 2 y i y 4 y 5 

STACK 3 z i z 2 z 3 

We want to represent all of this on the Tape of the TM as 



Instead of inventing new characters, we let the kth STACK be marked by the starting symbol 
# A . The operation of the conversion is so obvious that anyone who requires a further explana¬ 
tion will not understand it when it is presented. 

So, a TM can accept anything that an nPDA can. Obviously, an nPDA can accept any¬ 
thing a 2PDA can ,which is anything a TM can. 

Therefore, in power 

«PDA = TM for n > 2 ■ 

Once we reach the level of a TM, it is hard to go farther. There is good reason to believe 
that it is impossible to go farther, but that is a discussion for Chapter 25. 

Symbolically, we can represent the power comparison of our various mathematical 
models of machines as follows: 

FA = TG = NFA < DPDA < PDA < 2PDA = /iPDA - PM - TM 

(Note that, as of this point, we have not yet proven that 2PDA is definitely stronger than 
PDA because a PDA is nondeterministic, but we shall do so soon.) 

The underlying structure of this book is now finally revealed: 


PARTI 

FA 

0 

PDA 

PART II 

PDA 

1 

PDA 

PART III 

TM 

2 

PDA 


The machines in our highest class are all deterministic. Perhaps a nondeterministic nPDA 
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(jVnPDA), a nondeterministic Post machine (NPM), or a nondeterministic Turing machine 
(NTM) would be even stronger. In the next chapter, we shall see that this is not the case. All 
these nondeterministic machines are only equivalent in power to the TM, not stronger. We 
have gone about as far as we can go. 


PROBLEMS 


Consider the following 2PDA: 



1. Trace the execution of these input strings on this machine. 

(i) aabb 

(ii) babab 

2. Prove that the language accepted by this 2PDA is the language EQUAL. 

3. Draw a 3PDA that accepts the language { a n b‘ n a n b n }. 

4. Draw a PM that accepts the language {a n b n a n b n ). 

5. Draw a 2PDA that accepts the language \a n b n a n b n }. 

6 . Let us use the alphabet X = [a b c d\. Build a 3PDA that accepts the language 

{a n b n c n d n ). 

7. Outline a 2PDA that accepts the language defined in the previous problem. 

Let us define the language VERYEQUAL over the alphabet X = [a b c} as all strings 
that have as many total a’s as total b y s as total c’s (see p. 375): 

VERYEQUAL = { abc acb bac bca cab cba aabbcc aabcbc . . .} 

8 . Draw a TM that accepts VERYEQUAL. 

9. Draw a PM that accepts VERYEQUAL. 

10. (i) Draw a 3PDA that accepts VERYEQUAL. 

(ii) Draw a 2PDA that accepts VERYEQUAL. 

11. Draw a 2PDA that accepts the language EVEN-EVEN and keeps at most two letters in 
its STACKs. 


Problems 


493 


12. Draw a 2PDA that accepts MIDDLEA (see p. 478). 

13. Outline a 2PDA that accepts PALINDROME. 

14. Draw a 2PDA that accepts TRAILING-COUNT. (p. 204) 

15. Draw a 2PDA that accepts MOREA. (p. 205) 

16. On the TM that was formed from the 2PDA in the example on p. 489, trace the execu¬ 
tion of the following input strings: 

(i) abb 

(ii) baa 

17. On the 2PDA that was formed from the PM in the example on p. 490, trace the execu¬ 
tion of the following input strings: 

(i) abba 

(ii) babab 

18. (i) Draw a 3PDA to accept the language { a n b 2n c n } over the alphabet X = { a b c\. 

(ii) Draw a 2PDA to accept this language. 

(iii) Draw a deterministic PDA that accepts that language. 

19. If L is a language accepted by a 2PDA, prove that TRANSPOSED) (p. 91) is also a lan¬ 
guage accepted by 2PDA. 

20. (i) Without referring to the material in any other chapter, show that any language that 

can be accepted by a 3PDA can be accepted by a 2PDA. 

(ii) Generalize. 









4 THE MOVE-IN-STATE MACHINE 
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Turing machines can be drawn using different pictorial representations. Let us consider the 
diagram below, which looks like a cross between a Mealy and a Moore machine: 
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Let us call machines drawn in this fashion move-in-state machines. After analyzing the 
preceding machine, we shall prove that move-in-state machines have the same power as TMs 
as we originally defined them. 

The action of the preceding move-in-state machine drawn is to start with any word on 
its Tape, leave a space, and make an exact copy of the word on the Tape. If we start with the 
word w , we end up with the string wAw: 

baab becomes baabAbaab 

a becomes aAa 

A . . . becomes A . . . 

The algorithm is as follows: We start in state 1. If we read an a, we take the high road: 
state 2-state 3-state 4-state 1. If we read a b , we take the low road: state 5-state 6 -state 
4-state 1. Suppose that we read an a. This is changed into an x as we travel along the edge 
labeled alx to state 2, where the Tape Head is moved right. In state 2 , we now skip over all 
the a' s and b' s remaining in w, each time returning to state 2 and moving the Tape Head 
right. When we reach the first A after the end of w, we take the edge labeled A/A to state 3. 
This edge leaves the A undisturbed. The Tape Head is moved by state 3 to the right again. In 
state 3, we read through all the letters we have already copied into the second version of w 
until we read the first A. We then take the A /a edge to state 4. Along the edge, we change the 
A into an a (this is the letter we read in state 1 ). State 4 moves the Tape Head left, reading 
through all the a' s and b's of the second copy of w, then through the A, and then through the 
a’s and b’s of the part of the original w that has not already been copied. 

Finally, we reach the x with which we marked the letter a that we were copying. This 
we change back to an a on the edge labeled x/a, y/b going to state 1. State 1 tells us to move 
the Tape Head to the right, so we are ready to copy the next letter of w. If this letter is an a , 
we take the high road again. If it is a b, we change it to a y and take the route state 5-state 6 
to find the blank that we must change to a b in the second copy. Then in state 4, we move the 
Tape Head back down to the y and change it back to a b and return to state 1. When we have 
finished copying all of w, state 1 reads a A and we halt. 

The following is the trace of the operation of this machine on the input string baa : 


ala,b/b 


5 6 

yaaA yaaAA 

2 2 

bxaAb bxaAb 

4 4 


yaaAb yaaAb 
3 


bxaAbA bxaAba bxaAba bxaAba bxaAba baaAba 


baxAba baxAba baxAba baxAbaA baxAbaa bax Abaci 


This is a new way of writing the program part of a TM; we still use the same old TAPE 
and Tape Head. In this picture the edges are labeled as in a Mealy machine with input-slash- 
output instructions. An edge labeled plq says, “If the Tape Head is reading a p, change it to 
q and follow this arrow to the next state.” The edge itself does not indicate in which directio 
the Tape Head is to be moved. The instructions for moving the Tape Head are found on 
we enter the next state. Inside the circles denoting states, we have labels that are name-slas 
move indicators. For example, 4/L says, “You have entered state 4; please move the Ta 
Head one cell to the left.” When we commence running the machine in the START state, % 
do not execute its move instruction. If we reenter the start state, then we follow its move i 
struction. 


a 

* 

Has 


baxAbaa baxAbaa baaAbaa HALT 

It is not obvious that move-in-state machines have the same power as TMs. Why is that? 
Because move-in-state machines are limited to always making the same Tape Head move 
every time we enter a particular state, whereas with TMs we can enter a certain state, having 
moved the Tape Head left or right. For example, the TM situations: 
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cannot simply be converted into move-in-state TMs by adding Tape Head moving instruc¬ 
tions into state 9. However, we can get around this difficulty in a way analogous to the 
method we used for converting Mealy into Moore machines. The next two theorems prove 

Move-in-state = TM 


THEOREM 52 

For every move-in-state machine M, there is a TM, T, which accepts the same language. 
That is, if M crashes on the input w, T crashes on the input vv. If M loops on the input w, T 
loops on the input w. If M accepts the input w, then T does too. We require even more. After 
halting the two machines, leave exactly the same scattered symbols on the Tape. 


PROOF 

The proof will be by constructive algorithm. 

This conversion algorithm is simple. One by one, in any order, let us take every edge in 
M and change its labels. If the edge leads to a state that tells the Tape Head to move right, 
change its labels from X/Y to (.X , F, R). If the edge leads to a state that tells the Tape Head to 
move left, change its labels from X/Y to (X, F, L). To make this description complete, we 
should say that any edge going into the HALT state should be given the Tape Head move in¬ 
struction, R. 

When all edge labels have been changed, erase the move instructions from inside the 
states. For example, 


b/X ±/b 


becomes 




The resulting diagram is a TM in normal form that operates exactly as the move-in-state 
machine did. The trace of a given input on the move-in-state machine is the same as the trace 
of the same input on the converted TM. ® 


EXAMPLE 

The move-in-state machine above that copies input words will be converted by the algorithm 
given in this proof into the following TM: 
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(a,b;=,R) (a,b;~,R) 



(a,b;=,R) ( a,b;=,R) 
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For every TM T, there is a move-in-state machine M that operates in exactly the same way 
on all inputs—crashing, looping, or accepting. Furthermore, the move-in-state machine will 
always leave the same remnants on the Tape that the TM does. 


PROOF 

The proof will be by constructive algorithm. 

We cannot simply “do the reverse” of the algorithm in the last proof. If we try to move 
the Tape Head instructions from the edges into the states themselves, we sometimes succeed 
and sometimes fail, depending on whether all the edges entering a given state have the same 
Tape Head direction or not. This is a case of deja vu. We faced the same difficulty when 
converting Mealy machines into Moore machines—and the solution is the same. If edges 
with different Tape Head movement directions feed into the same state, we must make two 
copies of that state, one labeled move R and one labeled move L, each with a complete set of 
the same exit edges the original state had. The incoming edges will then be directed into 
whichever state contains the appropriate move instruction. 

For example, 



becomes 
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Some states become twins; some remain single. State by state we make this conversion 
until the TM is changed into a move-in-state machine that acts on inputs identically to the 
way the old TM used to. 

If the START state has to split, only one of its clones can still be called START—it 
does not matter which, because the edges coming out of both are the same. 

If a state that gets split loops back to itself, we must be careful to which of its clones the 
loops go. It all depends on what was printed on the loop edge. A loop labeled with an R will 
become a loop on the R twin and an edge from the L twin. The symmetric thing happens to a 
TM edge with an L move instruction. 

This process will always convert a TM into an equivalent move-in-state machine, equiv¬ 
alent both in the sense of language-acceptor and in the sense of TAPE-manipulator. ■ 


EXAMPLE 


Let us consider the following purely random TM: 



When the algorithm of the preceding theorem is applied to the states of this TM in or¬ 
der, we obtain the following conversion: 



i 


mz 




Notice that HALT 3 is the same as writing HALT 3//?, but if the edge entering HALT 
moved left, we would need a different state because input might then crash while going into 
the HALT state. ■ 


|l 
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We have been careful to note that when we combine the last two theorems into one 
statement 

TM = move-in-state machine 

we are not merely talking about their power as language-recognizers, but as transducers as 
well. Not only do the same words run to HALT on the corresponding machines, but also they 
leave identical outputs on the input Tape. The importance of this point will be made clear 
later. 

THE STAY-OPTION MACHINE 

Another variation on the definition of the TM that is sometimes encountered is the “stay- 
option” machine. This is a machine exactly like a TM except that along any edge we have 
the option of not moving the Tape Head at all—the stay option. Instead of writing L or R as 
directions to the Tape Head, we can also write S for “stay put ” 

On the surface, this seems like a ridiculous thing to do, because it causes us to read next 
the character that we have just this instant printed. However, the correct use of the stay op¬ 
tion is to let us change states without disturbing the Tape or Tape Head, as in the example 
below: 



We stay in state 3 skipping over b’s until we reach an a or a A. If we reach an a, we 
jump to state 7 and there decide what to do. If we reach a A, we go to state 4, where more 
processing will continue. In either case, we are reading the first of the new characters. 

The question arises, “Does this stay option give us any extra real power, or is it merely a 
method of alternate notation?” Naturally, we shall once again prove that the stay option adds 
nothing to the power of the already omnipotent TM. 

EXAMPLE 

We have had some awkward moments in programming TMs, especially when we wanted to 
leave the Tape Head pointing to a special symbol such as a * in cell i or a # in between 
words. We used to have to write something like 


(a,b\ ~L) 



State 7 backs down the Tape looking for the *. State 8 finds it, but the Tape Head bounces 
off to the right. We then have to proceed to state 9 to leave the Tape Head pointing to the *. 
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With the stay option this becomes easier: 

(«,/>; =,L) 


DEFINITION 


Let us call a TM with a stay option a stay-option machine. 

We now show that the stay option, although it may be useful in shortening 
adds no new power to the TM. 


THEOREM 54 


PROOF 


into 


EXAMPLE 


Because a TM is only a stay-option machine in which we have not bothered to use the stay 
tion, it is clear that for any TM there is a stay-option machine that does the same thing 
TM itself. What remains for us to show is that if the stay option is ever used, we can 
with other TM programming and so convert a stay-option machine into an equivalent TM. 
To do this, we simply follow this replacement rule. Change any edge 


introducing a new state 3'. It is patently obvious that this does not change the 
any input string at any stage. 

When all stay-option edges have been eliminated (even loops), what remains is the 
sired regular TM. 


Now that we have shown that the stay-option is harmless, we shall feel free to use 
the future when it is convenient. 


Here, we shall build a simple machine to do some subtraction. It will start with a 
the form #(0 + 1)* on its Tape. This is a # in cell i followed by some binary number. 


stay-option machine = TM 


In other words, for any stay-option machine there is some TM that acts the same way 
inputs, looping, crashing, or accepting while leaving the same data on the Tape; and 
versa. 


The Stay-Option Machine 


501 


of this stay-option machine is to subtract 1 from this number and leave the answer on the 
Tape. This is a binary decrementer. 

The basic algorithm is to change all the rightmost 0’s to l’s and the rightmost 1 to 0. 
The only problem with this is that if the input is zero, that is, of the form #0*, then the algo¬ 
rithm gives the wrong answer because we have no representation for negative numbers. 

The machine below illustrates one way of handling this situation: 



What happens with this machine is 

START #101001000 

Becomes state 1 #101001000A 

Becomes state 1 #101001111 

Becomes state 2 #101000111 

If we are in state 2 and we are reading a 0, we must have arrived there by the edge 
(1,0, S), so in these cases we proceed directly to (0,0, R ) HALT. 

If, on the other hand, we arrive in state 2 from the edge (#, #, R), it means we started 
with zero, #0*, on the Tape: 

START #0000 

Becomes state 1 #OOOOA 
Becomes state 1 #1111 

Becomes state 2 #1111 
Becomes state 2 #AAAAA 

In state 2, we erase all these mistaken 1 ’s. If the input was zero, this machine leaves an 
error message in the form of the single character #. 

In this machine, there is only one stay-option edge. Employing the algorithm from the 
preceding theorem, we leave the state 1-state 2 edge (#, #, R) alone, but change the state 
1 - state 2 edge (1,0, S) as follows: 



There are some other minor variations of TMs that we could investigate. One is to allow 
the Tape Head to move more than one cell at a time such as 

(X, Y, 3 R) = (read X, write Y, move 3 cells to the right) 


This is equivalent to 
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(X.Y.R) /TN (any; =,i?) (any; = R) 


Some other instructions of this ilk are 

(X, Y, 2D or (X, Y, 33 R) 

It is clear that these variations do not change the power of a TM as acceptor or trans¬ 
ducer; that is, the same input strings are accepted and the stuff they leave on the Tape is the 
same. This is, in fact, so obvious that we shall not waste a theorem on it. 


- :: 

f| 
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THE *-TRACK TM 

In addition to variations involving the move instructions, it is also possible to have variations 
on the Tape structure. The first of these we shall consider is the possibility of having more 
than one Tape. 

The picture below shows the possibility of having four Tapes stacked one on top of the 
other and one Tape Head reading them all at once: 


Tape 1 

a 

b 

b 

a 

a 

Tape 2 

A 

A 

A 

A 

A 

Tape 3 

b 

A 

A 

a 

A 

Tape 4 

b 

b 

a 

b 

b 


In this illustration, the Tape Head is reading cell iii of Tape 1, cell iii of Tape 2, cell iii 
of Tape 3, and cell iii of Tape 4 at once. The Tape Head can write something new in each of 
these cells and then move to the left to read the four cell ii’s or to the right to read the four 
cell iv’s. 


DEFINITION 

A ft-track TM, or ftTM, has ft normal TM Tapes and one Tape Head that reads correspond¬ 
ing cells on all Tapes simultaneously and can write on all Tapes at once. There is also an al¬ 
phabet of input letters X and an alphabet of Tape characters T. The input strings are taken 
from X, while the Tape Head can write any character from T. 

There is a program of instructions for the Tape Head consisting of a START state, 
HALT states, other states, and edges between states labeled 



where p, q, r, s, t, u, v, w, ... are all in T and M is R or L, meaning that if what is read 
from Tape 1 is p, from Tape 2 is q, from Tape 3 is r, from Tape 4 is s , and so on, then what 


■i if""' 

- M 
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m 


will be written on Tape 1 is t, on Tape 2 is u, on Tape 3 is v, on Tape 4 is w, and so on. The 
Tape Head will be moved in the direction indicated by M. 

To operate a ftTM, we start with an input string from X* on Tape 1 starting in cell i, and 
if we reach HALT, we say that the string is in the language of the ftTM, We also say that the 
content of all the Tapes is the output produced by this input string. ■ 

This is a very useful modification of a TM. In many applications, it allows a natural cor¬ 
respondence between the machine algorithm and traditional hand calculation, as we can see 
from the examples below. Notice that we use the words track and Tape interchangeably for a 
ftTM. 


EXAMPLE 




a 


When a human adds a pair of numbers in base 10, the algorithm followed is usually to line 
them up in two rows right-adjusted, find the right-hand column, and perform the addition 
column by column moving left, remembering whether there are carries and stopping when 
the last column has been added. 

The following 3TM performs this algorithm exactly as we were taught in third grade except 
that it uses a column of $’s to mark the left edge. Track 1 and track 2 contain the numbers to be 
added and track 3 is all blanks. The total will be found on track 3 when we reach HALT. 


any non= > 
any non A, = R 
any = , 
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m \ 

U 1 / 
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U 0 / 
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The loop from no-carry back to itself takes care of all combinations: 


A u + v 


1 





I 


CHAPTER 22 Variations on the TM 

where u + v is less than 10. 

The edges from no-carry to owe-carry are labeled 


\A u + v 

where u + v 2= 10. 

The loop from owe-carry back to itself is 


where u + v 3= 9. 

The edge from owe-carry to no-carry is 


\A u + v + 1 j 

where u + v ^ 8. 

We trace this input on this 3TM: 

START START START START START 

$429 $429 $429 $429 $ 4 2 9 A 

$933 —>$933—*$933 —>$933—>$933A 

£ A A A $ A A A $ A A A $ A A A $ A A A A 

No-carry Owe-carry No-carry Owe-carry HALT 

$429 $429 $429 $429 A429 

~> $933 —>$933—>$933 —>$933 —>A933 

$ A A A $ A A 2 $ A 6 2 £362 1362 

The correct total, 1362, is found on Tape 3 only. The data left on the other Tapes is not 
part of the answer. We could have been erasing Tape 1 and Tape 2 along the way, but this 
way is closer to what humans do. 

We could have started with both input numbers on Tape 1 and let the machine transfer 
the second number to Tape 2 and put the $’s in the cell i’s. These chores are not difficult. ■ 

Considering TMs as transducers has not seemed very important to us before. In a PDA, 
we never considered the possibility that what was left in the STACK when the input was ac¬ 
cepted had any deep significance. Usually, it was nothing. In our early TM examples, the 
Tape often ended up containing random garbage. But, as the example above shows, the im¬ 
portance of the machine might not be simply that the input was accepted, but what output 
was generated in the process. This is a theme that will become increasingly important to us 
as we approach the back cover. 

We should now have a theorem that says that ATMs have no greater power than TMs do 
as either acceptors or transducers. This is true, but before we prove it, we must discuss what 
it means. As we have defined it, a ATM starts with a single line of input just as a TM does 
However, the output from a ATM is presumed to be the entire status of all A Tapes. How can 
a TM possibly hope to have output of this form? We shall adapt a convention of correspon 
dence that employs the interlacing cells on one Tape to simulate the multiplicity of ATM 
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We say that the 3TM Tape status 


a 

d 

8 

. . . 

b 

e 

h 

. . . 

c 

f 

i 



corresponds to the one-TAPE TM status 



This is an illustration for three tracks, but the principle of correspondence we are using 
applies equally well to A-tracks. 

We can now prove our equality theorem. 

THEOREM 55 

Part 1 Given any TM and any A, there is a ATM that acts on all inputs exactly as the 
TM does (that means either loops, crashes, or leaves a corresponding output). 

Part 2 Given any ATM for any A, there is a TM that acts on all inputs exactly as the 
ATM does (that means loops, crashes, or leaves a corresponding output). 

In other words, as an acceptor or transducer, 

TM = ATM 

PROOF 

Proof of Part 1 

One might think that Part 1 of this proof is trivial. All we have to do is leave Tape 2, 
Tape 3, . . . , Tape A always blank and change every TM edge label from (.X , Y, Z) in the 
original TM into 

IX Y \ 

A A Z 

r A / 

The end result on Tape 1 will be exactly the same as on the original TM. This would be fine 
except that under our definition of correspondence 


a 

b 

c 

d 


A 

A 

A 

A 


A 

A 

A 

A 



does not correspond to the TM Tape status 
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□ 

1 b 1 

LlJ 

a 


but rather to the TM Tape status 


E 

A 

A 


A 

Z 


A 

ZJ 

UJ 

A 

A 


To have a &TM properly correspond to a TM once we have adopted our definition of 
correspondence, we must convert the answer Tape on the kTM from 


a 

b 

c 

d 


A 

A 

A 

A 


A 

A 

A 

A 



into this form 


a 

d . . . 

b 

. . . 

c 



The subroutine to do this begins as follows: 



This notation should be transparent. The arrow from “any” to “ = ” means that into the 
location of the “ = ” we shall put whatever symbol occupied the location of the “any.” 

We now arrive at 


a 

A 

A 

d 

, . . 

A 

b 

A 

A 


A 

A 

c 

A 



We need to write a variation of the DELETE subroutine that will delete a character from one 
row without changing the other two rows. 

To do this, we start with the subprogram DELETE exactly as we already constructed it 
in Chapter 19 and we make k (in this case, 3) offshoots of it. In the first, we replace every 
edge label as follows: 
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To get out of this endless loop, all we need is an end-of-data marker and a test to tell us 
when we have finished converting the answer on track 1 into the A-track form of the answer. We 
already know how to insert these things, so we call this the conclusion of the proof of Part 1. 

Proof of Part 2 

We shall now show that the work of a ATM can be performed by a simple TM. Surprisingly, 
this is not so hard to prove. 

Let us assume that the ATM we have in mind has k — 3 and uses the Tape alphabet 
T = [a b $}. (Remember, A appears on the Tape but is not an alphabet letter.) There are 
only 4 X 4 X 4 = 64 different possibilities for columns of Tape cells. They are 


m 


The TM we shall use to simulate the 3TM will have a Tape alphabet of 64 + 3 charac- 


r = )a b $ A 

I Aj 


We are calling symbols such as 


: . : 


s_ 


a single Tape character, meaning that it can fit into one cell of the TM and can be used in the 
labels of the edges in the program. For example. 




tt.au 





will be a legal simple instruction on our simple TM. 

These letters are admittedly very strange, but so are some others soon to appear. 

We are now ready to simulate the 3TM in three steps: 

Step 1 The input string X l X 2 X 3 . . . will be fed to the 3TM on Tape 1 looking like 
this: 


X, 

*2 

*3 

A 

A 

A 

A 

A 

A 


I 




Because our TM is to operate on the same input string, it will begin like this: 
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To begin the simulation, we must convert the whole string to triple-decker 
characters corresponding to the 3TM. We could use something like these in¬ 
structions: 



We must have some way of telling when the string of J’s is done. Let us say 
that if the X’s are a simple input word, they contain no A’s and therefore we are 
done when we reach the first blank. The program should be 



We shall now want to rewind the Tape Head to cell i so we should, as usual, 
have marked cell i when we left it so that we could back up without crashing. 
(This is left as a problem below.) If the 3TM ever needs to read cells beyond 
the initial ones used for the input string, the simulating TM will have to re¬ 
member to treat the new A’s encountered as though they were 



Step 2 Copy the 3TM program exactly for use by the simulating TM. Every 3TM in¬ 
struction 




which is a simple TM instruction. 
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Step 3 If the 3TM crashes on a given input, so will the TM. If the 3TM loops 

on a given input, so will the simple TM. If the 3TM reaches a HALT state, 
need to decode the answer on the TM. This is because the 3TM final result 


d 

8 

j 

m 

A 

e 

h 

k 

A 

A 


i 

l 

A 

A 


will sit on the TM as: 


but the TM Tape status corresponding to the 3TM answer is actually 


We must therefore convert the TM Tape from triple-decker characters to 
single-letter strings. 

This requires a state with 64 loops like the one below: 


Expander 


Once the answer has been converted into a simple string, we can halt. To 
when to halt is not always easy because we may not always recognize 
3TM has no more non-A data. Reading 10: 


does not necessarily mean that we have transcribed all the useful 
from the 3TM. However, we can tell when the simple TM is finished 
ing triples. When the expander state reads a single A, it knows that it 
that part of the original TM Tape not needed in the simulation of the 
we add the branch 


Expander 


This completes the conversion of the 3TM to a TM. The algorithm for k other than 
tirely analogous. 
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We shall save the task of providing concrete illustrations of the algorithms in this theo¬ 
rem for the Problems section. 


THE TWO-WAY INFINITE Tape MODEL 

The next variation of a TM we shall consider is actually Turing’s own original model. He did 
not use the concept of a “half-infinite” Tape. His Tape was infinite in both directions, which 
we call doubly infinite, or two-way infinite. (The Tapes as we defined originally are some¬ 
times called one-way infinite Tapes.) 

The input string is placed on the Tape in consecutive cells somewhere and the rest of the 
Tape is filled with blanks. There are infinitely many blanks to the left of the input string as 
well as to the right of it. This seems to give us two advantages: 

1. We do not have to worry about crashing by moving left from cell i, because we can al¬ 
ways move left into some ready cell. 

2. We have two work areas not just one in which to do calculation, because we can use the 
cells to the left of the input as well as those farther out to the right. 

By convention, the Tape Head starts off pointing to the leftmost cell containing non¬ 
blank data. 

The input string abba would be depicted as 


We shall number the cells once an input string has been placed on the Tape by calling 
the cell the Tape Head points to cell i. The cells to the right are numbered as usual with in¬ 
creasing lowercase Roman numerals. The cells to the left are numbered with zero and nega¬ 
tive lowercase Roman numerals. (Let us not quibble about whether the ancient Romans 
knew of zero and negative numbers.) 


THEOREM 56 

TMs with two-way Tapes are exactly as powerful as TMs with one-way Tapes as both lan¬ 
guage-acceptors and -transducers. 


PROOF 


The proof will be by constructive algorithm. 

First, we must show that every one-way TM can be simulated by a two-way TM. We 
cannot get away with saying, “Run the same program on the two-way TM and it will give 
the same answer” because in the original TM if the Tape Head is moved left from cell i, the 
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input crashes, whereas on the two-way TM it will not crash. To be sure that the two-way TM 
does crash every time its Tape Head enters cell 0, we must proceed in a special way. 

Let © be a symbol not used in the alphabet T for the one-way TM. Insert ( 0 ) in cell 
0 on the two-way TM and return the Tape Head to cell i: 

/■- N (any,-,/.) / \ U,©,K) 

( 5TftRT ) ——vj-- 

From here, let the two-way TM follow the exact same program as the one-way TM. 

Now if, by accident, while simulating the one-way TM, the two-way TM ever moves 
left from cell i, it will not crash immediately as the one-way TM would, but when it tries to 
carry out the next instruction, it will read the © in cell 0 and find that there is no edge for 
that character anywhere in the program of the one-way machine. This will cause a crash, and 
the input word will be rejected. 

One further refinement is enough to finish the proof. (This is one of the subtlest of sub¬ 
tleties in anything we have yet seen.) The one-way TM may end on the instruction 

O^-C^D 

where this left move could conceivably cause a crash, preventing successful termination at 
HALT without actually reading the contents on cell 0, merely moving in. To be sure that the 
one-way TM also crashes in its simulation, it must read the last cell it moves to. We must 
change the one-way TM program to 



We have yet to prove that anything a two-way TM can do can also be done by a one¬ 
way TM. And we will not. What we shall prove is that anything that can be done by a two- 
way TM can be done by some 3TM. Then by the previous theorem there is a one-way TM, 
which can do anything this 3TM can do. 

Let us start with some particular two-way TM. Let us wrap the doubly infinite Tape 
around to make the figure below: 


cell i 

cell ii 

cell iii 

cell iv 

cell v 

. . . 







cellO 

cell — i 

cell -ii 

cell -iii 

cell — iv 



Furthermore, let us require every cell in the middle row to contain one of the following 
five symbols: A, f , J, , f f , [ J, . 

The single arrows will tell us which of the two cells in the column we are actually read¬ 
ing. The double arrows, for the tricky case of going around the bend, will appear only in the 
first column. The middle track will always contain one double arrow, at most one single ar¬ 
row and A’s for all the rest. 
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If we are in a positively numbered cell and we wish to simulate on the 3TM the two- 
way TM instruction 

we can simply write this as 



where S is the stay option for the Tape Head. The second step is necessary to move the ar¬ 
row on track 2 to the correct column. We do not actually need S. We could always move one 
more left and then back. 

For example, 


0 

i 

ii 

iii 

iv 



1 A 

1 a 1 

LlJ 

1 b 1 

Ld 

A 



0 



causes 
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ii 
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1 a 1 
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Analogously, 


a 

b 

b 

a 

A 


T T 
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A 
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causes 
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If we were in a negatively numbered cell on the two-way TM and asked to move R , we 
would need to move left in the 3TM. 



could become 


'any, 

= \ 

/ any, = \ 

i r 

A L) ^ 

A,1 S 

b'. 

A i S 

\any, = / 



This is because in the two-way TM moving right from cell — iii takes us to cell — ii, 
which in the 3TM is to the left of cell - iii. 

In the two-way TM, the Tape status 


-n —i 
_ | — 


and the instruction 



-in —u —l 


Analogously, in the 3TM the Tape status 


1 

n 

in 

IV 

V 

A 

A 

A 

A 

A 

1 1 

A 

A 

1 

A 

b 

a 

a 

b 

A 

0 

—i 

—ii 

D 

—iv 


and the instructions 


/any. = \ 


/any, = \ 

‘ . A /- 


A, t S) 

\ b.A I 


, \any, = / 


i 3 ) 



will cause the result 
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The tricky part comes when we want to move right from cell 0. That we are in cell 0 can 
be recognized by the double down arrows on the middle Tape. 



can also be 



This means that we are now reading cell i, having left an A in cell 0. 

There is one case yet to mention. When we move from cell - i to the right to cell 0, we 
do not want to lose the double arrows there. So instead of just 



we also need 



The full 3TM equivalent to the two-way TM instruction 



is therefore 
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By analogous reasoning, the equivalent of the left move 


is therefore 



X, Y 

t f, u s 

k any, = 


where 3' is used when moving left from a negative cell, 3" for moving left from a positive 
cell, the second label on 3" to 8 for moving left from cell ii into cell i, and the bottom edge 
for moving left from cell i into cell 0. 

We can now change the program of the two-way TM instruction by instruction (edge by 
edge) until it becomes the analogous program for the 3TM. 

Any input that loops/crashes on the two-way TM will loop/crash on the 3TM. If an in¬ 
put halts, the output found on the two-way TM corresponds to the output found on the 3TM 
as we have defined correspondence. This means it is the same string, wrapped around. With 
a little more effort, we could show that any string found on track 1 and track 3 of a 3TM can 
be put together on a regular half-infinite Tape TM. 

Because we went into this theorem to prove that the output would be the same for the 
one-way and two-way TMs, but we did not make it explicit where on the one-way TM Tape 
the output has to be, we can leave the matter right where it is and call this theorem proven. ■ 


EXAMPLE 


The following two-way TM takes an input string and leaves as output the a-b complement of 
the string; that is, if abaaa is the input, we want the output to be babbb. 

The algorithm we follow is this: 

1. In cell 0, place a *. 

2. Find the last nonblank letter on the right and erase it. If it is a *, halt; if it is an a, go to 

step 3; if it is a b, go to step 4. :;§i| 

3. Find the first blank on the left, change it to a b, and go to step 2. 

4. Find the first blank on the left, change it to an a , and go to step 2. 

The action of this algorithm on abaaa is 

abaaa 


*abaaa 

bbb*ab 


*abaa 

bbb*a 


b*abaa 

abbb*a 


b*aba 
abbb * 


bb*aba 

babbb* 


bb*ab 

babbb 


If we follow this method, the output is always going to be left in the negatively num¬ 
bered cells. However, on a two-way Tape this does not have to be shifted over to start in cell 
i since there is no way to distinguish cell i. The output is 


A 


* 

b 

b 

b 

A 


0 
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which can be considered as centered on the Tape (infinitely many A’s to the right, infinitely 
many A’s to the left). 

The program for this algorithm is 


(a A*; =,#) 
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The task of completing this picture is left for obsessive compulsives. 




1: 


: 




There are other variations possible for TMs. We recapitulate the old ones and list some 
new ones below: 

Variation 1 Move-in-state machines 

Variation 2 Stay-option machines 


Variation 2 Stay-option machines 

Variation 3 Multiple-track machines 

Variation 4 Two-way infinite Tape machines 

Variation 5 One Tape, but multiple Tape Heads 

Variation 6 Many Tapes with independently moving Tape Heads 

Variation 7 Two-dimensional Tape (a whole plane of cells, like infinitely many tracks) 

Variation 8 Two-dimensional Tape with many independent Tape Heads 

Variation 9 Make any of the above nondetenninistic 

At this point, we are ready to address the most important variation: nondeterminism. 




If 

I 

? 


THE NONDETERMINISTIC TM 


DEFINITION 


A nondeterministic TM, or NTM, is defined like a TM, but allows more than one edge 
leaving any state with the same first entry (the character to be read) in the label; that is, in 
state Q if we read a Y, we may have several choices of paths to pursue: 



’ 

IP 7 ' ■ 

m 

- v 


I: 


( Y,W,R) 


An input string is accepted by an NTM if there is some path through the program that 
leads to HALT, even if there are some choices of paths that loop or crash. ■ 


We do not consider an NTM as a transducer because a given input may leave many pos¬ 
sible outputs. There is even the possibility of infinitely many different outputs for one partic¬ 
ular input as below: 



START 




1 


(A,A ,R) 


HALT 
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This NTM accepts only the input word a , but it may leave on its Tape any of the infi¬ 
nitely many choices in the language defined by the regular expression b*. depending on how 
many times it chooses to loop in state 1 before proceeding to HALT. 

For a nondeterministic TM, T, we do not bother to separate the two types of nonaccep¬ 
tance states, reject(T) and loop(T). A word can possibly take many paths through T. If some 
loop, some crash, and some accept, we say that the word is accepted. What should we do 
about a word that has some paths that loop and some that crash but none that accept? Rather 
than distinguish crash from loop, we lump them together as not in the language Accept(T). 

Two NTMs are considered equivalent as language-acceptors if 

Accepter,) = Accept(T 2 ) 
no matter what happens to the other input strings. 
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NTM = TM 


PROOF 

First, we show that any language accepted by an NTM can be accepted by a (deterministic) 
TM. The proof will be by constructive algorithm. We shall start with any NTM and construct 
a deterministic 3TM that accepts the same language. Because we know that 3TM = TM, this 
will complete the proof. ■ 

Let us start by numbering each edge in the entire NTM machine by adding a number la¬ 
bel next to each edge instruction. These extra labels do not influence the running of the ma¬ 
chine, they simply make description of paths through the machine easier. For example, the 
NTM below: 



(which does nothing interesting in particular) can be edge-instruction-numbered to look like: 
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5 (a,A,R) 


10(A ,X,L) 



11 (b,b,R) 


There is no special order for numbering the edge instructions. The only requirement is 
that each instruction receive a different number. 

In an NTM, every string of numbers determines at most one path through the machine 
(which also may or may not crash). The string of numbers 

1 - 5 - 6 - 10 - 10-11 

represents the path 

START-state 1-state 1-state 3-state 3-state 3-HALT 

This path may or may not correspond to a possible processing of an input string—but it is a 
path through the graph of the program nonetheless. 

Some possible sequences of numbers are obviously not paths—for example, 

9-9-9-2-11 

2-5-6 

1-4-7-4-11 

The first does not begin at START, the second does not end in HALT, and the third asks edge 
7 to come after edge 4, but these do not connect. 

To have a path traceable by an input string, we have to be careful about the Tape con¬ 
tents as well as the edge sequence. To do this, we propose a three-track TM on which the 
first track has material we shall discuss later, the second track has a finite sequence of num¬ 
bers (one per cell) in the range of 1 to 11, and the bottom track has the input sequence to be 
simulated—for example, 








11 

4 

6 

6 

A 

A 

a 

b 

a 

A 

A 

A 


In trying to run an NTM, we shall sometimes be able to proceed in a deterministic 
way (only one possibility at a state), but sometimes we may be at a state from which 
there are several choices. At this point, we would like to telephone our mother and ask 
her advice about which path to take. Mother might say to take edge 11 at this juncture 
and she might be right; branch 11 does move the processing along a path that will lead to 
HALT. On the other hand, she might be way off base. Branch 11? Why, branch 11 is not 
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even a choice at our current crossroads. (Some days mothers give better advice than other 
days.) 

One thing is true, //a particular input can be accepted by a particular NTM, then there is 
some finite sequence of numbers (each less than the total number of instructions, 11 in the 
NTM above) that label a path through the machine for that word. If mother gives us all pos¬ 
sible sequences of advice, one at a time, eventually one sequence of numbers will constitute 
the guidance that will help us follow a path to HALT. If the input string cannot be accepted, 
nothing mother can tell us will help. For simplicity, we presume that we ask mother’s advice 
even at deterministic states. 

So, our 3TM will work as follows: 

On this track we run the input using mother’s advice. 

On this track we generate mother’s advice. 

On this track we keep a copy of the original input string. 

If we are lucky and the string of numbers on track 2 is good advice, then track 1 will 
lead us to HALT. 

If the numbers on track 2 are not perfect for nondeterministic branching, then track 1 
will lead us to a crash. Track 1 cannot loop forever, because it has to ask mother’s advice at 
every state and mother’s advice is always a finite string of numbers. 

If mother’s advice does not lead to HALT, it will cause a crash or simply run out and we 
shall be left with no guidance. If we are to crash or be without mother’s advice, what we do 
instead of crashing is start all over again with a new sequence of numbers for track 2. We do 
the following: 

1. Erase track 1. 

2. Generate the next sequence of mother’s advice. 

3. Recopy the input from where it is stored on track 3 to track 1. 

4. Begin again to process track 1, making the branching shown on track 2. 

What does this mean: Generate the next sequence of mother’s advice? If the NTM we 
are going to simulate has 11 edge labels, then mother’s advice is a word in the regular lan¬ 
guage defined by 

(1+2 + 3+ • ■ • + 11)* 

We have a natural ordering for these words (the words are written with hyphens between the 
letters): 

One-letter words 123... 9 10 11 

Two-letter words 1-1 1-2 ... 1-11 2-1 2-2 2-3 .. . 11-11 

Three-letter words 1-1-1 1-1-2 1-1-3 . . . 11-11-10 11-11-11 

Four-letter words 1-1-1-1 . . . 

If a given input can be accepted by the NTM, then at least one of these words is good advice. 

Our 3TM works as follows: 

1. Start with A’s on track 1 and track 2 and the input string in storage on track 3. 

2. Generate the next sequence of mother’s advice and put it on track 2. (When We start up, 

the “next sequence” is just the number 1 in cell i.) 
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3. Copy track 3 onto track 1. 

4. Run track 1, always referring to mother’s advice at each state. 

5. If we get to HALT, then halt. 

6. If mother’s advice is imperfect and we almost crash, then erase track 1 and go to step 2 

Mother’s advice could be imperfect in the following ways: 

i. The edge she advises us to take is unavailable at the state we are in. 

ii. The edge she advises is available, but its label requires that a different letter be read by 
the Tape Head than the letter our Tape Head is now reading from track 1. 

iii. Mother is fresh out of advice; for example, her advice on this round was a sequence of 
five numbers, but after five edges we are not yet in HALT. 

Let us give a few more details of how this system works in practice. We are at a certai 
state reading the three tracks. Let us say they read 





The bottom track does not matter when it comes to the operation of a run, only when it 
comes time to start over with new advice. 

We are in some state reading a and 6. If mother’s advice is good, there is an edge from 
the state we are in that branches on the input a. But let us not be misled; mother’s advice is 
not necessarily to take edge 6 at this juncture. 

To find the current piece of mother’s advice, we need to move the Tape Head to the firs 
unused number in the middle track. That is the correct piece of mother’s advice. After 30 
edges, we are ready to read the thirty-first piece of mother’s advice. The Tape Head wil 
probably be off reading some different column of data for track 1, but when we nee( 
mother’s advice, we have to look for it. 

Our problem is that we have only one Tape Head but we want to keep track of wher 
we are on two different Tape tracks, and it would only be coincidence if the two active cell 
were in the same column. What is worse is that we wish to alternate reading what is on trac 

1 and what is on track 2. After each Tape Head move on track 1, we want to go back to trac 

2 to get our directions, and then we want to return to track 1 to carry them out. Essentiall 
what we must do is mark our spot on track 1 so we know how to return to it. We do this b 
one of our favorite uses of artistic expression—blue paint. Let us assume that we have tw 
copies of the alphabet of Tape characters for track 1: one in black ink and one in blue. Whe 
we have to leave track 1 to dig up our new instructions from track 2, we turn the character t 
which the Tape Head was pointing into its blue version. When we wish to return to whe 
we were on track 1, we run the Tape Head up from cell i until we reach the blue letter. The 
we turn it back into black and resume execution of mother’s instruction. 

Similarly, when we drop back to track 2 to get mother’s next instruction, we have to 
able to find out where we were in executing her advice so far. If we erase her advice as w 
read it, it will be impossible to generate the lexicographically next string of mother’s advic 
if we fail to accept the input through this set of instructions. We need to keep mother’s a 
vice intact but mark just how far along we are. The answer is blue paint, of course. The pie 


ri 
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of mother’s advice we are trying to follow will be painted blue as we leave. If following that 
piece of advice does not cause a crash or lead to HALT, then we shall return for more ad¬ 
vice. We rewind the Tape Head to cell i and scan track 2 until we get to the blue instruction 
number. This one we turn back to black and read the next one, turning it blue. 

If we are out of mother’s advice, which we notice when the next cell on track 2 contains 
a A, it is time to erase track 1, increment track 2, copy track 3 to track 1, rewind the Tape 
Head to cell i, and read mother’s first instmction. 

How can we actually implement these ideas in practice? The first thing we must do is to 
insert end markers in cell i on all tracks. That is easy using the subprogram INSERT $. The 
second thing we have to do is copy track 3 (which always keeps a pristine version of the in¬ 
put to be simulated) onto track 1. This we do basically with the simple 3TM program seg¬ 
ment 



We know that on our first iteration mother’s advice starts out simply as the number 1, but 
exactly how we can increment it when the time comes is another question. We have already 
seen incrementation done in binary in this chapter (p. 500), and incrementation in base 11 (or 
however many edge instructions the NTM has) is quite similar. We wind the Tape Head up 
the Tape to the first A and bounce olf to the left. If it is not yet an 11, increase it by 1. If it is 
an 11, set it equal to 1 and move the Tape Head left to increase the next digit. If this is not an 
11, we are done. If it is, set it equal to 1 and repeat. If we get to $ having found only ll’s, 
then we know that the string of l’s we have created is too short (like going from 999 to 1000, 
only easier). So, we run up the Tape and add another 1 to the end of the non-A string. 

Suppose someone asks us how we know to use base 11 and not some other number? 
Then we know that he has lost the point of what we are doing. We are initially presented 
with an NTM, and given it specifically, we are going to make a particular 3TM that will run 
on all inputs, not the same as the NTM does, but with the same result—acceptance only 
when the NTM accepts. We are allowed to examine the NTM before building our 3TM (it 
would be quite advisable to do so). This is when we discover how many edge instructions 
the NTM has and, therefore, when we learn how to design the mother’s advice-incrementing 
subprogram. 

Now suppose we have retrieved a piece of mother’s advice and it says to take edge in¬ 
struction 6. How do we actually do this on our 3TM? Some of the states in our 3TM must 
have the meaning “in the simulation of the input we are in state x on the NTM and we must 
now go seek mother’s advice,” and some of the states have the meaning, “in the simulation 
of the input on the NTM we are in state x and mother has just advised us to take edge y.” We 
leave a state of the second type and find mother’s advice and then we arrive at a state of the 
second type. While there, we make a detour to have the Tape Head find and read the next 
letter of the simulation on track 1. Now we are all set. We are in a state that knows where we 
are on the NTM, which edge we wish to follow, and what character is being read by the Tape 
Head. Then if possible, we execute that instruction; that is, we change the Tape cell con¬ 
tents, move the Tape Head, and go to a 3TM state that represents the next NTM state the in¬ 
struction would have us enter. All this 3TM programming we can build from looking at the 
NTM alone, without reference to any particular input string. There are only a finite number 
of total possibilities for being in NTM state x and trying to follow instruction y, and they are 
connected by 3TM edges in an obvious way. 
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Most likely, we cannot follow mother’s capricious advice (even though she has told us a 
thousand times) in any particular situation. Her randomly chosen edge instruction has a low 
probability of starting from the state we are in, and less considering we might not be reading 
the proper character from track 1. Even then, the instruction we are asked to follow might 
move the Tape Head inadvertently into cell i (which contains the cushion $, but it does mean 
the NTM would have crashed). In any of these events, mother’s advice turns out to have 
been infelicitous. And we must wipe the slate clean and start again with the next advice. 

However, we must always remember that if there actually is a path for this particular in¬ 
put from START to HALT on the NTM, then there is some sequence of edge instructions 
comprising that path, and sooner or later that very path will be mother’s advice. So every 
word accepted by the NTM is accepted by the 3TM. If a given input has no path to accep¬ 
tance on the NTM, then the 3TM will run forever, testing one sequence of mother’s advice af¬ 
ter another ad infinitum. Nothing ever crashes on the 3TM; it just optimistically loops forever. 

We have shown a TM can do what an NTM can do. Obviously, an NTM can do any¬ 
thing that a TM can do, simply by not using the option of nondeterminism. ■ 


The next theorem may come as a surprise, not that the result is so amazing but that it is 
strange that we have not proven this already. 

THEOREM 58 

Every CFL can be accepted by some TM. 

PROOF 
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We know that every CFL can be accepted by some PDA (Theorem 30, p. 318) and that every 
PDA PUSH can be written as a sequence of the PM instructions ADD and SHIFT-RIGHT 
CYCLICALLY (p. 469). What we were not able to conclude before is that a PM could do 
everything a PDA could do because PDAs could be nondeterministic, whereas PMs could 
not. If we convert a nondeterministic PDA into PM form we get a nondeterministic PM. 

If we further apply the conversion algorithm of Theorem 47 (p. 462) to this nondeter¬ 
ministic PM, we convert the nondeterministic PM into a nondeterministic TM. 

Using our last theorem, we know that every NTM has an equivalent TM. 

Putting ail of this together, we conclude that any language accepted by a PDA can be 
accepted by some TM. ■ 

THE READ-ONLY TM 

So far, we have considered only variations of the basic mathematical model of the TM that 
do not affect the power of the machine to recognize languages. We shall now consider a vari¬ 
ation that does substantially hamper the capacity of the TM: the restriction that the Tape 
Head can write nothing new on the Tape. 

DEFINITION 

A read-only TM is a TM with the property that for every edge label in the program the 
READ and WRITE fields are the same. This means that if the Tape Head reads an x, it must 
write an x, no matter what x is. All edge labels, therefore, are of the form (x, x, y), where y is 
either L or R. Because the Tape Head cannot change the contents of the Tape, the input al- 
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phabet equals the output alphabet. The Tape Head can move back and forward over the in¬ 
put string as much as it wants, but the contents of the Tape remain unchanged. ■ 

As a transducer, a read-only TM is very easy to describe: output = input. The interesting 
question is, “What types of languages can a read-only TM recognize as an acceptor?” 

It is conceivable that some advantage can be gained by reading some of the blank cells 
to the right of the input string on the Tape before the machine decides to halt, loop, or crash, 
but because nothing can be written in these cells, they cannot be used to store information. 
Also, after the first A all the rest are known to be blank and nothing about the particular in¬ 
put string on the Tape can be learned from them. For these reasons, it is customary to re¬ 
quire a read-only TM to accept or reject a string by the time it has read its first A, if not 
sooner. 

A read-only TM is sometimes called a two-way FA, because it acts like an FA in the sense 
that the transitions from state to state take place by reading without writing. The modifier “two- 
way” is intended to explain how letters can be reread once they have already been scanned. Our 
original model of the FA did not involve a Tape or Tape Head, and the letters were deemed to 
have been consumed by the machine once ingested. However, we could have begun our discus¬ 
sion of mathematical models of computing with the TM (which was historically first) and then 
defined the FA as a read-only one-way TM. One justification for calling a read-only TM an FA 
is that, unlike our other variations of the Turing model, the read-only machine does not have the 
same power as a TM but only the power of a standard FA, as we shall now prove. 

An FA and a PDA can read each letter of their input string only once, but the PDA has a 
note pad on which it can record some facts about what it has read. We have seen that this ex¬ 
tra ability substantially increases its capacity to recognize languages. Although a read-only 
TM does not have a note pad, if a question does arise at some point in the processing where 
the machine must make a branching decision in the program based on some previously avail¬ 
able but forgotten information, the Tape Head can move back down the Tape to the left to 
recheck what it had once read. The difficulty is that once it has done this, how is it ever go¬ 
ing to return to the exact spot on the Tape where the question first arose? The read-only Tape 
Head is unable to leave a marker. When it scans back up the Tape to where the branch point 
was encountered, it may well be going through a different sequence of states than it tra¬ 
versed in its first trip up the Tape. We have seen situations in which the choice of the series 
of states itself carried the required information. However, it is possible that, even with the in¬ 
formation in hand, the Tape Head can still not relocate the Tape cell from which it started 
backtracking. The additional freedom of motion of the Tape Head might not actually in¬ 
crease the power of the machine as much as we may wish. 

All of this very informal speculation suffers from excessive anthropomorphism and the 
following pathetic fallacy. As we have noted before, a programmer’s inability to figure out 
how to do something is not a proof that it cannot be done. It is not the machine that is unable 
to return to the correct spot, but the human who constructed the program who might not be 
able to figure out how to relocate the position or to employ special powers to make the relo¬ 
cation unnecessary. Perhaps a more clever program can employ the back-and-forth ability of 
read-only TMs to recognize all CFLs or some other more interesting set of languages. What 
we need here is a mathematical proof. 

Because we intend to show that a read-only TM can accept only regular languages, per¬ 
haps a good way to do this is to show how to convert the whole machine into one regular ex¬ 
pression as we did in the proof of Kleene’s theorem in Chapter 7, by developing an elaborate 
constructive algorithm. In order to turn FAs into expressions, we introduced the notion of a 
generalized transition graph, which is an FA in which the edges are labeled with regular ex¬ 
pressions instead of single alphabet letters. With a little effort we shall show that this strat¬ 
egy can be made to work in our present case as well. 
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To accomplish the conversion of the TM into a regular expression, we shall now define a 
transition edge in a read-only TM to be an edge whose label has the form (r, D ), where r is 
a regular expression and D a Tape Head direction: L, R, or S. The meaning of the edge 


(ab*aa, R) 


is that if the machine is ever in state 7 and the cell being read on the Tape, possibly when 
joined to the next few cells to the right of it, form any string belonging to the language de¬ 
fined by the regular expression ab*aa, then the Tape Head may move to the right across all 
of those cells and the program will progress to state 3. 

This is necessarily a nondeterministic option because a string of a 's could leave the pro¬ 
gram below in two different states, depending on how many were read to get to state 3: 


We must be careful to define what we mean by reading a string of letters to the left. Sup¬ 
pose, moving leftward, we read the letter r followed by the letter a followed by the letter t. It 
is logical to say that the string read is rat , but it is also logical, and more useful, to note that 
the string traversed was tar , which sits on the Tape in that very order when read by our usual 
convention of left to right. We shall adopt this second view. For example, starting with this 
situation 


if we traverse the edge below going from state 7 to state 3, 


~ (tor, L) ~ 

W) --- 


we will end up with the Tape Head as indicated: 


We can now define a transition Turing machine (TTM) to be a nondeterministic read¬ 
only TM, which allows transition edges. 

Let us clear up one possible point of confusion. It makes no sense in the definition of 
transition edge to allow the regular expression to be the empty expression 0, because this 
would mean that the Tape Head would move without passing over any letters in the cells, 
which is obviously impossible. 

Let us recall the main operation in the analogous part of the proof of Kleene’s theorem 
(p. 96): the process of bypassing a state in the transition graph by hooking up all the edges 
that lead into the state with all the edges that lead out of the state in all possible ways so as 
to make that state unnecessary to the operation of the machine. By reiterating this procedure, 
we were able to eliminate, one by one, all the states except for the start state and one final 
state. From the label of the edge between these two, we could then read off the regular ex- 
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pression equivalent to the language accepted by the machine. Our question is whether, by 
employing the model of the TTMs, we are able to imitate the steps in the proof of Kleene’s 
theorem and produce a regular expression equivalent to the language accepted by any given 
read-only TM. 

If we wish to connect an incoming right-moving edge with an outgoing right-moving 
edge, the situation is completely analogous to the case of Kleene’s theorem. 


is equivalent to 


in exactly the same sense that we were able to make this substitution for TGs. Any word 
from the language of r,r 2 could take us from state 7 to state 3 to state 11 if we parsed it cor¬ 
rectly. This then represents a nondeterministic option. If there is a different way of parsing 
the expression that causes the input to crash, so be it. Acceptance by nondeterministic ma¬ 
chines means that there is some way to reach HALT, not that all paths do. 

We can even handle the case of a right-moving loop at the middle state without any 
worry. 

Clearly, 


(r 2 , R) 



is equivalent to 



The amalgamation of left-moving edges is similar but with a slight twist. The path 
below: 


is equivalent to 


The reason for the reversal in the concatenation is that the read field in the edge label indi¬ 
cates the combined string as it appears from left to right on the Tape. In going from state 7 to 
state 3, we might traverse a section of the Tape containing the letters ward (first the d, then 
the r, then the a, then the w), and then, while going from state 3 to state 11, we might traverse 
the letters back (first the k . . .). Altogether, we have then traversed the string backward. 

The case of a left-moving loop at the middle state can be handled exactly as the loop in 
the right-moving case; that is, it introduces a starred regular expression in the concatenation. 

The real problem comes in figuring out what the net effect might be of combining two 
edges that move the Tape Head in opposite directions. Let us consider what can we do with 
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First, the Tape Head moves up the Tape to the right, scanning over a string from the lan¬ 
guage of r,; then it moves leftward down the Tape, covering a string from the language of r 2 . 
Considering the freedom possible between these two regular expressions, we have no way of 
telling whether the Tape Head ends up to the right or left of where it started. It is even possi¬ 
ble that after all this travel it is back where it started, reading the same cell in state 11 that it 
was reading in state 7. 

If we are to replace this sequence of two edges with one edge running from state 7 to 
state 11, that one edge must have three labels allowing for the three possibilities of motion of 
the Tape Head. The new edge must have the form 


Note that we must allow for the possibility of the stay option discussed earlier in this 
chapter. The question now is what regular expressions are we going to fill in where the 
dots are? 

Let us first consider the situation where the Tape Head ends up to the right of where it 
started. The string that appears to have been the one traversed is not all the letters that were 
covered going up the Tape to the right and then partially back down to the left, but only 
those letters that were not read twice. For example, if the Tape situation is 


P \ <l\r\s\t\u 


and the two edges executed are 


(pqrst, H) ( stu , L) _ 

j) ->0--Hn) 


then by state 3 the situation is 


p\q\r\s\t\u 


and by state 11 it is 


p \ q\ r\ s \ t\u 


which is equivalent to the execution of the single instruction 


_ (pq,R) 

(t) - —- >( n) 



The Read-Only TM 


529 


This situation is a little more subtle than we might have imagined. We would like to 
have been able to invoke Theorem 16 (p. 202), the division theorem for regular languages, to 
say that if we follow a word from r, going up to the right, and then come back down to the 
left over a word in r 2 , the result is the same as covering a word from the language 
Pref(r 2 in r,), which, as we recall, is the language of prefixes that, when added to some 
words in r 2 , make them into some words in r,. However, as we can see from the preceding 
example, after the Tape Head has moved over the string pqrst, it is pointing to the cell after 
the last of these letters. If the Tape Head moves to the right over a word from r 1? the next 
letter it reads is no longer part of the word from Tj but a new arbitrary letter unanticipated by 
the language r r In the preceding example, this is the letter u. 

It is also true that when the Tape Head moves down the Tape to the left, covering a 
word from the language of r 2 , the cell it ends up pointing to contains a letter (the letter r in 
the preceding example) that is neither part of the r 2 string nor part of the short-form agglom¬ 
erated instruction (pq , R). An end letter (u) is added and a middle letter (r) is wasted. There¬ 
fore, if we want to write 



it is inaccurate to claim that 

r 3 “ Pref(r 2 in r,) 

without some substantial modification. 

The total string of cells read by the Tape Head is not just the word from r, but one cell 
more than that. If this cell contains a blank, then the processing is over. The only other possi¬ 
bility is that this cell contains an a or b, if we assume that is the total alphabet to be found on 
the Tape, in which case the total string of letters involved is a word in the language defined 
by the regular expression 

rj(a + b) 

It is also clear that the string of letters read only once (the pq in the earlier example) is not 
the prefix of the word from r 2 but one letter more than that. In fact, it is the prefix left over 
when a string from the language defined by the regular expression 

(a + b)r 2 

has been removed. The accurate definition of r 3 is then 

r 3 = Pref((a + b)r 2 in r^a + b)) 

By Theorem 16, we know that this prefix language is regular and must therefore be definable 
by some regular expression that we can call r 3 . 

This accounts for the situations in which the Tape Head ends up to the right of where it 
started, but it is also possible that after reading up the Tape over a word in r, and then down 
over a word in r 2 , it ends up to the left of where it started. As an example of this, let us con¬ 
sider the following situation. Start with 







CHAPTER 22 Variations on the TM 


Problems 


531 


(In this diagram, as in all diagrams in this section, all the letters must be either a 's or /?’s be¬ 
cause the only thing ever found on the Tape in a read-only TM is the untouched initial in¬ 
put.) 

If we execute the two instructions 


{pqrst, R) 

©-—-* 


( nopqrstu, L) 

© —-——►© 


the net result is to leave the situation 


E 

I 

I 

I 

I 

I 

1 

1 

3 


0 

which is equivalent to having executed the one instruction 

■0 


{nop, L) 

© ■ > i 


©- 


(r 3 , L) 


-© 


i 




As before, we wish to replace the two instructions 

(r P I?) (r 2 , L) 

with one instruction of the form 

(r v L) 

where r 3 is a regular expression defining the appropriate language. It is almost true that r 3 is 
the language of prefixes that, when added to the front of words in r,, give us words in r 2 . 
However, as before, we must add an extra letter onto the end of the string in r l to account for 
the fact that r 2 will include the cell immediately to the right of the r, string. But this alone is 
not enough of an adjustment. 

We can see from the example above that the letter p is read going up the Tape to the 
right and read going down the Tape to the left, and yet it is still the first letter in the resultant 
r 3 move. The string nop is, in fact, the prefix of the string qrstu in the word nopqrstu. Instead 
of subtracting exactly the words in r, from the string in r 2 , what we need to do is subtract all 
but the first letter of the r, word, so that this letter will still be there for r 3 to read. 

If we wish to define r 3 as the prefix of something in the language of r 2 (a + b), that 
something is the language formed by taking each word in r, and chopping off its first letter 
and adding a new last letter. Let us call this language Chop(r,)(a 4- b). The correct definition 
of r 3 is then 

r 3 = Pref(Chop(r 1 )(a 4- b) in r 2 ) 

We may be tempted to ask the question whether Chop(r } ) is a regular language. It so hap¬ 
pens that it is, as anyone who does the exercises at the end of this chapter will discover. But 
we can apply Theorem 16 without knowing this fact. The language Pref(Q in R ) was shown 
to be regular whenever R is regular, no matter what flavor Q comes in. Q is certainly some 
language, and that is all we need to know. 

Therefore, we have shown that there is some regular expression r 3 that we can use in the 
edge label to make 






© 


di, R) 


KD 



whenever the Tape Head ends up to the left of the cell from which it started. Let us note 
clearly here that we have presented a proof of the existence of such a regular expression 
without providing a constructive algorithm for producing it from r x and r 2 . 

The last case we have to consider is the one where the Tape Head ends up in state 11 
back at the same cell from which it started. It reads some word from r, going up the Tape to 
the right on its way to state 3 and then reads some word from Chop(rj(a + b)) on its way to 
state 11. The net result is that what was read was A. This is described by the edge 


© —— 



which need only be included as an option when Chop(r,(a + b)) and r 2 have a word in com¬ 
mon. Therefore, the full description of the results of 


© 




when summarized as one edge from state 7 to state 11 is 


© 


(Pref[(a + b)r 2 in q(a + b)], R) 
(PreftChoptr^a + b)) in r 2 3, L) 
(A. 8) 


0 


the last option existing only if there is a word in common between Chop(rj(a 4- b)) and r 2 . 

This completely handles the situation in which we wish to replace a right-moving edge 
followed by a left-moving edge by one single edge, albeit with multiple labels. The only de¬ 
tail is showing how to replace a left-moving edge followed by a right-moving edge by one 
single edge—we do this with mirrors. Abracadabra, we are done (cf. p. 534). 

We have therefore proven the following. 


THEOREM 59 

A read-only TM, also known as a two-way FA, accepts exclusively regular languages. ■ 

This result was proven by Rabin and independently by J. C. Shepherdson. Because the 
proof depends heavily on the nonconstructive step of finding the regular expressions for the pre¬ 
fix languages, we are spared the trouble of illustrating the technique with a concrete example. 


PROBLEMS 


the equivalent of 


1. Convert these TMs to move-in-state machines: 
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2. (i) Draw a move-in-state machine for the language ODDPALINDROME. 

(ii) Draw a move-in-state machine for the language \a n b n }. 

(iii) Draw a move-in-state machine for the language EQUAL. 

(iv) Draw a move-in-state machine for the language: all words of odd length with a as 
the middle letter, MIDDLEA. 

3. Discuss briefly how to prove that multiple-cell-move instructions, such as ( x , y, 5 R) and 
(x, y, 17 L) mentioned on p. 502, do not increase the power of a TM. 

4. In the description of the algorithm for the 3TM that does decimal addition “the way hu¬ 
mans do,” we skimmed too quickly over the conversion of data section. The input is pre¬ 
sumed to be placed on track 1 as two numbers separated by delimiters—for example, 



The question of putting the second number onto the second track is a problem that we 
ignored in the discussion in the chapter. Write a 3TM subprogram to do it. 

5. In the proof of Theorem 55 (p. 506), where ATM = TM, we used two different methods 
for storing the A-tracks on the one TM tape. One was interlacing the tracks, and the 
other was using a vector alphabet. There is a third more simplistic method: Store the 
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working section of each of the A-tracks sequentially separated by markers. Show that 
this model can simulate a ATM for some arbitrary A. What other markers will be 
needed? 

6. (i) Outline a STM that does decimal addition for three numbers simultaneously, the 

numbers being on tracks 2, 3, and 4. The sum should be left on track 5, and track 1 
is reserved for carries. 

(ii) Outline a 4TM that does the same task without the need for carries. 

7. Outline a 5TM that multiplies two binary numbers initially on tracks 1 and 2. The prod¬ 
uct should be placed on track 3, using tracks 4 and 5 as a working area. 

8. Design a 2TM that accepts DOUBLEWORD in the following two steps: 

(i) Draw a 2TM that finds the middle letter of an input string of even length. Track 1 
consists of just the input string. The program should place two markers on track 2, y 
below the first letter in the string and z below the last letter. Next, the program 
should bring the two markers toward each other one cell at a time. Let the program 
crash on odd-length strings. Finally, erase the y marker. 

(ii) Using the above 2TM as a preprocessor, complete the machine to recognize DOU¬ 
BLEWORD. Reinsert the y marker at the front of the string, and, moving the mark¬ 
ers to the right one cell at a time, compare the letters. 

9. (i) Outline two procedures for a 3TM, to INSERT or DELETE a character from track 2 

only, leaving the other tracks unchanged. 

(ii) Draw a 3TM that accepts the language EQUAL' by splitting the a’s and h’s of the 
input on track 1 onto tracks 2 and 3 separately and then comparing them. 

10. Design a pattern that matches 2TM. The input is a long string on track 1 and a short 
string on track 2. The program halts only if the string on track 2 is a substring of the 
string on track 1. 

11. On a 2TM track 1 contains a string of the form (a 4- b) + which is to be interpreted as a 
unary representation of numbers as strings of a's, separated by single b' s. 

(i) Using a 2TM, find the largest of the numbers on track 1 and copy it to track 2. 

(ii) Using a 3TM, sort the list in descending order. 

12. Outline a 2TM that takes as input on track 1 a n and leaves on track 2 the binary repre¬ 
sentation of n. 

13. (i) Outline a 6TM that determines whether its binary input on track 1 is a perfect 

square by generating squares and comparing them to the input number. The pro¬ 
gram terminates when the square is found or the length of the track 1 square is 
greater than the length of the input number. 

(ii) Outline a 7TM that accepts the language 

SQUARE = { a n | n is a square} = {a aaaa aaaaaaaaa . . .} 

(See p. 204.) 

14. Draw a ATM that accepts MOREA (p. 205). 

15. Outline an argument that shows how a two-way TM could be simulated on a TM using 
the trick of interlacing cells on the Tape. That is, the Tape starts with a $ in cell i, and 
then cell ii represents cell 0 on the two-way TM, cell iii on the TM represents cell i on 
the two-way TM, cell iv on the TM represents cell — i on the two-way TM, cell v repre- 
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sents ceil ii, and so on. Show how to simulate the two-way TM instructions on this 
arrangement for a TM. 

16. On a certain two-way TM, the input is the single letter a surrounded by all As. Unfortu¬ 
nately, the Tape Head is somewhere else on the Tape and we do not know where. Our 
job is to arrange for the Tape Head to find the a. 

(i) Show that if the two-way TM is nondeterministic, the problem is easy. 

(ii) Show that if the two-way TM has two tracks, the problem can be solved. 

(iii) Outline a solution for the one-track deterministic two-way TM. 

17. (i) Outline a proof that a nondeterministic PM has the same power as a regular PM. 

(ii) Outline a proof that a nondeterministic 2PDA has the same power as a regular 

2PDA. 

18. (i) If we had introduced the proof that ATMs were the same as TMs earlier, would it 

have made the proof that PM — TM, or that 2PDA = TM, any easier? 

(ii) If we had introduced the proof that NTM — TM earlier, would it have made the 
proof that PM = TM, or that 2PDA — TM, any easier? 

19. Prove that if r is a regular language, Chop(r), defined as the language of all non-A 
words in r with their first letter removed, is also regular. 


I'j 






u 


i? 


20. Complete the proof of Theorem 59 (p. 531). 

(i) Show the details of how to replace a left-moving edge followed by a right-moving 
edge with a single edge. 

(ii) Explain what can be done about loops. 
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4 s RECURSIVELY ENUMERABLE LANGUAGES 

We have an independent name and an independent description for the languages accepted by 
FAs: The languages are called regular, and they can be defined by regular expressions. We 
have an independent name and an independent description for the languages accepted by 
PDAs: The languages are called context-free, and they can be generated by context-free 
grammars. We are now ready to discuss the characteristics of the languages accepted by 
TMs. They will be given an independent name and an independent description. The name 
now; the description later. 


DEFINITION 

A language L over the alphabet X is called recursively enumerable if there is a TM T that 
accepts every word in L and either rejects (crashes) or loops forever for every word in the 
language V, the complement of L. 

accept(T) = L 

reject(7) + loop(7) = V U 

EXAMPLE 

The TM drawn on p. 446 divided all inputs into three classes: 

accept(T) = all words with aa 
reject(7) = strings all without aa ending in a 
loop(7) = strings all without aa ending in b, or A 

Therefore, the language (a + b)*aa(a + b)* is recursively enumerable. ■ 

A more stringent requirement for a TM to recognize a language is given by the fol¬ 
lowing. 
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CURSIVELY ENUMERABLE LANGUAGES 

We have an independent name and an independent description for the languages accepted by 
FAs: The languages are called regular, and they can be defined by regular expressions. We 
have an independent name and an independent description for the languages accepted by 
PDAs: The languages are called context-free, and they can be generated by context-free 
grammars. We are now ready to discuss the characteristics of the languages accepted by 
TMs. They will be given an independent name and an independent description. The name 
now; the description later. 


DEFINITION 

A language L over the alphabet X is called recursively enumerable if there is a TM T that 
accepts every word in L and either rejects (crashes) or loops forever for every word in the 
language L', the complement of L. 

accept(7) = L 

reject(T) + loop (T) = L' ■ 

EXAMPLE 

The TM drawn on p. 446 divided all inputs into three classes: 

accept(T) = all words with aa 
reject (T) = strings all without aa ending in a 
loop(7) = strings all without aa ending in h, or A 

Therefore, the language (a + b)*aa(a + b)* is recursively enumerable. ■ 

A more stringent requirement for a TM to recognize a language is given by the fol¬ 
lowing. 
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DEFINITION 

A language L over the alphabet X is called recursive if there is a TM T that accepts every 
word in L and rejects every word in L'\ that is, 

accept(T) = L 
reject(T) = L' 

loop (T) = cf> ■ 

EXAMPLE 

The following TM accepts the language of all words over {a b } that start with a and 
crashes on (rejects) all words that do not. 



Therefore, this language is recursive. ■ 

This term “recursively enumerable” is often abbreviated “r.e.,” which is why we never 
gave an abbreviation for the term “regular expression.” The term “recursive” is not usually 
abbreviated. It is obvious that every recursive language is also recursively enumerable, be¬ 
cause the TM for the recursive language can be used to satisfy both definitions. However, we 
shall soon see that there are some languages that are r.e. but not recursive. This means that 
every TM that accepts these languages must have some words on which it loops forever. 

We should also note that we could have defined r.e. and recursive in terms of PMs or 
2PDAs as well as in terms of TMs, because the languages that they accept are the same. It is 
a point that we did not dwell on previously, but because our conversion algorithms make the 
operations of the machines identical section by section, any word that loops on one will also 
loop on the corresponding others. If a TM, T , is converted by our methods into a PM, P, and 
a 2PDA, A, then not only does 

accept(T) = accept(P) = accept(A) 

but also 

loop (T) = loop(P) = loop(A) 
and 

reject(T) = reject(P) = reject (A) 

Therefore, languages that are recursive on TMs are recursive on PMs and 2PDAs as 
well. Also, languages that are r.e. on TMs are r.e. on PMs and 2PDAs, too. 

Turing used the term “recursive” because he believed, for reasons we discuss later, that 
any set defined by a recursive definition could be accepted by a TM. We shall also see that he 
believed that any calculation that could be defined recursively by algorithm could be per¬ 
formed by TMs. That was the basis for his belief that TMs are a universal algorithm device. 
The term “enumerable” comes from the association between accepting a language and listing 
or generating the language by machine. To enumerate a set (say, the squares) is to generate 
the elements in that set one at a time (1, 4, 9, 16, . . .). We take up this concept again later. 

There is a profound difference between the meanings of recursive and recursively enu¬ 
merable. If a language is regular and we have an FA that accepts it, then if we are presented 
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a string w and we want to know whether w is in this language, we can simply run it on the 
machine. Because every state transition eats up a letter from w, in exactly length(w) steps we 
have our answer. This we have called an effective decision procedure. However, if a lan¬ 
guage is r.e. and we have a TM that accepts it, then if we are presented a string w and we 
would like to know whether w is in the language, we have a harder time. If we run w on the 
machine, it may lead to a HALT right away. On the other hand, we may have to wait. We 
may have to extend the execution chain seven billion steps. Even then, if w has not been ac¬ 
cepted or rejected, it still eventually might be. Worse yet, w might be in the loop set for this 
machine, and we shall never get an answer. A recursive language has the advantage that we 
shall at least someday get the answer, even though we may not know how long it will take. 

We have seen some examples of TMs that do their jobs in very efficient ways. There are 
some TMs, on the other hand, that take much longer to do simple tasks. We have seen a TM 
with a few states that can accept the language PALINDROME. It compares the first and last 
letter on the Input Tape, and, if they match, it erases them both. It repeats this process until 
the Tape is empty and then accepts the word. 

Now let us outline a worse machine for the same language: 

1. Replace all a’s on the Tape with the substring bab. 

2. Translate the non-A data up the Tape so that it starts in what was formerly the cell of the 

last letter. 

3. Repeat step 2 one time for every letter in the input string. 

4. Replace all b’s on the Tape with the substring aabaa. 

5. Run the usual algorithm to determine whether or not what is left on the Tape is in 

PALINDROME. 

The TM that follows this algorithm also accepts the language PALINDROME. It has 
more states than the first machine, but it is not fantastically large. However, it takes many, 
many steps for this TM to determine whether aba is or is not a palindrome. While we are 
waiting for the answer, we may lose patience and mistakenly think that the machine is going 
to loop forever. If we knew that the language was recursive and the TM had no loop set, then 
we would have the faith to wait for the answer. 

Not all TMs that accept a recursive language have no loop set. A language is recursive if 
at least one TM accepts it and rejects its complement. Some other TMs that accept the same 
language might loop on some inputs. 

Let us make some observations about the connection between recursive languages and 
r.e. languages. 

THEOREM 60 

If the language L is recursive, then its complement V is also recursive. In other words, the 
recursive languages are closed under complementation. 

PROOF 

It is easier to prove this theorem using PMs than TMs. Let us take a language L that is recur¬ 
sive. There is then some PM, call it P, for which all the words in L lead to ACCEPT and all 
the words in L' crash or lead to REJECT. No word in X* loops forever on this machine. 

Let us draw in all the REJECT states so that no word crashes but, instead, is rejected by 
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landing in a REJECT. To do this for each READ, we must specify an edge for each possible 
character read. If any new edges are needed, we draw 




Now if we reverse the REJECT and ACCEPT states, we have a new machine that takes 
all the words of V to ACCEPT and all the words of L to REJECT and still never loops. 

Therefore, L' is shown to be recursive on this new PM. We used the same trick to show 
that the complement of a regular language is regular (Theorem 11), but it did not work for 
CFLs because PDAs are nondeterministic (Theorem 40, p. 387). ■ 

We cannot use the same argument to show that the complement of a recursively enumer¬ 
able set is recursively enumerable, since some input string might make the PM loop forever. 
Interchanging the status of the ACCEPT and REJECT states of a PM keeps the same set of 
input strings looping forever, so they will be undecided. 

Observation 

The reason it is easier to prove this theorem for a PM than for a TM is that not all TM rejec¬ 
tions are caused by being in a state and having no exit edge labeled for the Tape character 
being read. Some crashes are caused by moving the Tape Head left while in cell i. Crashes 
of this sort can be converted into the more standard type of crash by inserting a marker in 
cell i that would then stand for crashing by going left of cell i; this would be a special 
marker to the left of any other end-of-TAPE marker that the program would want to insert. If 
that marker is ever read, we would be transferred to a TM state with no outgoing edges 
whatsoever. In this state, we would crash in the usual TM way, by being unable to exit from 
a non-HALT state. This method of unifying TM crashes will be useful for us later. 

Just because the TM we know for a particular language has a loop set does not mean 
that there is not one that does not. Nor does it mean that we actually have to find the one that 
does not loop in order to establish that the language is recursive. 













THEOREM 61 

If L is r.e. and L’ is also r.e., then L is recursive. 


PROOF 

From the hypotheses, we know that there is some TM, say, T v that accepts L and some TM, 
say, T v that accepts V . From these two machines we want, by constructive algorithm, to 
build a TM, call it T y that accepts L and rejects L' because then T 3 would be the machine 
that proves L is recursive. 

The first thing we want to do is change T 2 so that it rejects V and only L '. It is not 
enough to turn the HALT state into a reject state; we must also be sure that it never crashes 
on any of the words it used to crash on. The words it formerly looped on are fine because 
they are not in V and they can still loop forever. The new machine we want, call it T 2 \ has 
the following characteristics: 


V =accept(r 2 ) — reject(T 2 ') 
loop(7’ 2 ) C loop(7y) 
reject(T 2 ) C loop(r 2 ') 

To do this we must eliminate all the crashes. The crash that occurs from moving the 
Tape Head left from cell i can be made into a typical TM crash, that is, being in a non- 
HALT state but being unable to exit. This can be accomplished by the trick mentioned in the 
preceding observation. But this is not enough for our purposes here because we must elimi¬ 
nate all the crashes in total and change them to loop-forevers. This we do by going state by 
state and finding every character that has no existing exit edge and drawing a new one going 
to a new state called NOWHERESVILLE on an edge labeled (it, =, R). For example, if a 
state had no b exit edge, we would draw one to NOWHERESVILLE labeled ( b , b> R ). Once 
we get to NOWHERESVILLE, of course, we are stuck there, because it has only one exit 
edge that is a loop labeled (any, =, R). So once in NOWHERESVILLE, we spend an eter¬ 
nity slowly inching our way up the Tape. The machine now has the same accept set, but the 
reject set has been merged into the loop set. 

Now we want to make the accept set a reject set. This is easy. We accept an input by ar¬ 
riving at a HALT state. If we erase the edges that lead into the HALT states, then when the 
program is in the states that would naturally have fed into the HALTs, given what the Tape 
Head is reading, a crash would occur instead, and the input will be rejected. This then is our 
T 2 '. It accepts nothing, rejects exactly L\ and loops often. 

We also want to modify T l in a similar way so that its accept set remains the same, that 
is, L, but its reject set is merged into its loop set so that it too never crashes. This we accom¬ 
plish by adding its own NOWHERESVILLE. Call this modified TM 7/. 

What we now have can be summarized as 

accept(T/) = L = loop(T 2 ') 
loop(r/)=L' = reject(r 2 ') 

Very simply, what we would like T 3 to do is to run the input string simultaneously on T x 
and T 2 ’. If the input string is in the language L, sooner or later it will be accepted by 7,'; if it 
is in the language L', it will, sooner or later, be rejected by T 2 '. And while we are waiting for 
one of these two events to occur, the nondeciding machine will not interrupt us by crashing. 
Now, because we cannot actually run the same input string on the two TMs simultaneously 
(they might want to change the Tape into incompatible things), the next best thing we can do 
is simulate running the input on the two machines alternately. That is, we take the first edge 
on Ty, then the first edge on T 2 , then the second edge on T x \ then the second edge on T 2 , 
then the third edge on T x ', and so on, until either T x takes us to HALT or T 2 crashes. A ma¬ 
chine like this is actually possible to build, and we will do it now. 

Let us for convenience call the states in T x START = x v x 2 , x 3 , . . . and the states in 
T 2 ' START = y v y v y y .... The Tape in T 3 will always look like this: 


B 

B 

B 

B 

m 

B 

B 

B 

B 

B 

B 

B 

B 

B 

m 

B 

B 

B 

B 

B 

B 


where the meaning of this is as follows. Cell i always contains a #. Between this # and the one 
and only * is the Tape status at the moment of the simulation of 7,', with the exception that in 
front of the cell that the 7j' Tape Head will next be reading is the name of the state that T { ' has 
just arrived in. Then comes the symbol * that separates the simulation of T x and the simulation 
of T 2 . Then the rest of the Tape is exactly what the current status of the Tape on T 2 would be at 
this moment, with the exception that in front of the cell that the T 2 Tape Head will next be 
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reading is the name of the state that T 2 has just entered. We assume that the # and the * as well 
as the names of the states are all unused by T { ' and T 2 f as Tape characters. This is a safe as¬ 
sumption because in our simulation they are both painted a very rare shade of blue. 

When we start with a simple input of T v we have to use a subprogram to set up the sim¬ 
ulation. It inserts # in cell i and x { in cell ii, runs to the end of the input and inserts * and y ; 
and then runs up and down the Tape, copying the input string into the blank cells after the y { . 
And then the Tape Head is returned to point to x r 
For example, the input abb goes from 



(The subprogram to do this is generously provided by the reader.) 

Before we proceed with the simulation, we should say a word about what happens when 
T 1 ' wants to read more cells of the Tape than the few we have allotted it between the # and * 
Whenever T, ' moves its Tape Head right, we immediately ask whether or not it is reading a *. If 
it is, we leave it alone, back up one cell, insert a A, and (because INSERT leaves the Tape Head 
to the right of the insertion) read the *, leave it alone, and back up again to read the A. 



In this way, we can insert as many blanks as the simulation of T,' needs. These blanks 
can be changed into other things, or other things can be made into blanks. So, blanks can oc¬ 
cur in the middle of the data and at the end of the data in the simulation of either TM. The T 2 
simulation will never try to move left and read the * because that would correspond to a crash 
on T 2 of moving left from cell i, but that is not how T 2 crashes, as we have guaranteed. 

If the T x simulation ever enters HALT, then T 3 halts and accepts the input. If the T 2 
simulation ever crashes, then T 3 crashes and the input is rejected. 

We still have to make explicit how T 3 can “make a move on the T,' side and then make a 
move on the T 2 side alternately.” To understand this, let us first see what happens immedi¬ 
ately after the setup subprogram is done. The Tape Head is reading x v which in turn is sitting 
in front of an a. T 3 is in a state called SIMULATE-T,'. This is the first important T 3 state. 

For every state x k in T,', this state has an outgoing edge labeled (x k , = , R) going to a differ¬ 
ent T 3 destination subprogram called SIM-jq.. The first thing we do in this subprogram is back 
up one cell and run subprogram DELETE, thereby removing the symbol x k from the Tape. Then 
we read the letter that is in the next cell on the Tape. This is the letter that the T,' Tape Head 
would be reading if the input were running on T,' alone. The program for T,' tells us what to 
change this letter to and then where to move the Tape Head and then which T { ’ state to go to 
next. The simulation has all this information built into it. It changes the T 3 Tape and simulates 
moving the T,' Tape Head by inserting the name of the next T ] ' state to be executed on the run¬ 
ning of T x ' to the left of the appropriate Tape cell. For example, if the T 3 Tape status is 
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\b\a\x 5 \b\a 


and state x 5 on 1 \' has the (unique) outgoing 6-edge 



then the simulation would change the T 3 Tape into 


The state SIM-x 5 treats each edge coming out of x 5 individually. Here, it correctly corre¬ 
sponds to being in state x 3 about to read an a. 

After doing this, SIM-x A then returns to the main T 3 program to the state FIND-T. In this 
state, the T 3 Tape Head is pushed right until it hits any y symbol. When it does, it enters an¬ 
other important state called SIMULATE-^'. This state reads the y k and branches to the ap¬ 
propriate subprogram SIM-y^, where it does its T 2 act. Once that has been completed, it re¬ 
turns to the main T 3 program to a state called FIND-X. This runs the Tape Head left down 
the Tape until it finds the (one and only) x k . From here it goes into the state SIMULATE-7,' 
and the process repeats itself. 

The outline of the whole jT 3 is 



FIND-X 
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The halting or crashing of T 3 takes place entirely within the simulations and we are cer¬ 
tain that, for every input, one or the other will take place. The language that will be accepted 
will be L and all of V will be rejected. g 

Again, the machines produced by the algorithm in this proof are very large (many, many 
states), and it is hard to illustrate this method in any but the simplest examples. 


EXAMPLE 

Consider the language L ~ b(a + b)*. L can be accepted by the following TM, 7,: 


(a,b,R) 

(b,b,R) 

i\b,R) 



accept(7,) = L 
loop (T,) = V 
reject(7,) = 4> 

The machine 7, proves that L is r.e., but not that L is recursive. The TM below, 7\, 


(a,a,R ) 
( b,a,R) 
(Ao ,R) 



accepts the language L' and loops on L. 

The first machine is already in 7,' format and the only adjustment necessary in the sec¬ 
ond to make it into 7 2 ' is to eliminate the HALT state and its incoming edges. We can com¬ 
bine them per the algorithm in the proof to produce 7 3 , which accepts L and rejects L 
thereby proving that L is recursive: 
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The first question that comes to most minds now is, “So what? Is the result of Theorem 
61 so wonderful that it was worth a multipage proof?” The answer to this is not so much to 
defend Theorem 61 itself, but to examine the proof. 

We have taken two different TMs (they could have been completely unrelated) and com¬ 
bined them into one TM that processes an input as though it were running simultaneously on 
both machines. This is such an important possibility that it deserves its own theorem. 

THEOREM 62 

If T x and T 2 are TMs, then there exists a TM, T v such that 

accept(7 3 ) = accept(7 1 ) + accept(T 2 ) 
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In other words, the union of two recursively enumerable languages is recursively enu¬ 
merable; the set of recursively enumerable languages is closed under union. 


PROOF 

The algorithm in the proof of Theorem 61 is all that is required. First, we must alter T, and 
T 2 so that they both loop instead of crash on those words that they do not accept. 

Now nothing stops the two machines from running in alternation, accepting any words 
and only those words accepted by either. The algorithm for producing T 3 can be followed 
just as given in the proof of Theorem 61. 

On the new machine 

accept(T 3 ) = accept(T,) + accept(T 2 ) 
loop(T 3 ) — all else 

reject(T 3 ) = (J> ■ 

We have proven that the class of recursively enumerable languages is closed under 
union by amalgamating two TMs. We are now interested in the question of the intersection 
of two recursively enumerable languages. For regular languages, we found that the answer to 
the question of closure under intersection was yes but for context-free languages the answer 
was no. We could deduce that the closure of two regular languages is regular based on the 
facts that the union and complement of regular languages are also regular. Then by DeMor- 
gan’s Law, the intersection, which is the complement of the union of the complements, must 
also be regular. Because the complement of a context-free language is not necessarily con¬ 
text-free, this proof strategy does not carry over and, indeed, we saw that the intersection of 
context-free languages need not be context-free. With recursively enumerable languages, we 
have a third situation. They are closed under union and intersection but (we shall see) not 
under complement. 

THEOREM 63 

The intersection of two recursively enumerable languages is also recursively enumerable. 

PROOF 

Let one of the languages be accepted by TM, and the other be accepted by TM 2 . We shall 
now construct a third TM by the following set of modifications: 

Step 1 Build a TM preprocessor that takes a two-track Tape and copies the input from 
track 1 onto track 2 and returns the Tape Head to cell column i and begins pro¬ 
cessing at the START state of TM,. 

Step 2 Convert TM, into a machine that uses a two-track Tape doing all of its process¬ 
ing exactly as before but referring only to the top track. Also change the HALT 
state of TM. into a state that rewinds the Tape Head to cell column i and then 


branches to the START state of TM 2 . 


Step 3 


Convert TM 2 into a machine that uses a two-track Tape, doing all of its pro¬ 
cessing exactly as before but referring only to the bottom track. Leave the 
HALT state untouched. 
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We can now build a new TM that first runs the input string on TM, and then, if and only 
if the string is accepted, it runs the same input on TM 2 . The HALT state of this combined 
machine is analogous to the HALT state of TM 2 , but it is reached only when the input has 
halted on both TMs. This machine then accepts those words, and only those words, that are 
accepted by both initial machines. It is, therefore, a TM acceptor of the intersection lan¬ 
guage. ■ 


THE ENCODING OF TURING MACHINES 

It is now time to ask our usual questions about the class of r.e. languages. We have answered 
the question about the union and intersection of r.e. languages, but that still leaves open 
product, Kleene closure, complement, the existence of non-r.e. languages, and the decidabil¬ 
ity of emptiness, finiteness and membership. We shall attack these in a slightly different or¬ 
der than we did for the other language classes we analyzed. 

TMs do seem to have immense power as language-acceptors or language-recognizers, 
yet there are some languages that are not accepted by any TM, as we shall now prove by 
“constructing” one. 

Before we can describe such a language, we need to develop the idea of encoding 
TMs. 

Just as with FAs and PDAs, we do not have to rely on pictorial representations for 
TMs. We can make a TM into a summary table and run words on the table as we did with 
PDAs in Chapter 15. The algorithm to do this is not difficult. First, we number the states 
1, 2, 3, . . . and so on. By convention, we always number the START state 1 and the 
HALT state 2. Then we convert every instruction in the TM into a row of the table as 
shown below: 


From 

To 

Read 

Write 

Move 

1 

3 

a 

a 

L 

3 

1 

k 

b 

R 

8 

2 

b 

a 

R 


where the column labeled “Move” indicates in which direction the Tape Head is to move. 


EXAMPLE 

The TM shown below: 


(b,b,R) 



can be summarized by the following table: 
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From 

To 

Read 

Write 

Move 

1 

1 

b 

b 

R 

1 

3 

a 

b 

R 

3 

3 

a 

b 

L 

3 

2 

A 

b 

L 


Because we know that state 1 is START and state 2 is HALT, we have all the informa¬ 
tion in the table necessary to operate the TM. ■ 

We now introduce a coding whereby we can turn any row of the TM into a string of a’s 
and b’s. 

Consider the general row 


From 

To 

Read 

Write 

Move 

*> 


*3 


*5 


where X, and X 2 are numbers, X 3 and X 4 are characters from {a b #} or A, and X 5 is a di¬ 
rection (either L or R). 

We start by encoding the information X, and X 2 as 

a x 'ba^b 

which means a string of a' s of length X } concatenated to a b concatenated to a string of a’s 
X 2 long concatenated to a b. This is a word in the language defined by a + ba + b. 

Next, X 3 and X 4 are encoded by this table: 


x 3 ,x 4 

Code 

a 

aa 

b 

ab 

A 

ha 

# 

bb 


Next, we encode X 5 as follows: 


*5 

Code 

L 

a 

R 

b 


Finally, we assemble the pieces by concatenating them into one string. For example, the row 



becomes 
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state 6 ----^ J 1 

separator-—-- ^ J 

state 2 —----- S J 

separator —--—• ^ J 

read b -- * - ^ J 

write a --- ^ J 

move left -----—-- 

Every string of a’s and b r s that is a row is of the form definable by the regular expression 

a + ba + b(a + b) 5 

= (at least one a)b{ at least one a)/?(five letters) 

It is also true that every word defined by this regular expression can be interpreted as a 
row of a TM summary table with one exception: We cannot leave a HALT state. This means 
that aaba + b(a + b) 5 defines a forbidden sublanguage. 

Not only can we make any row of the table into a string, but we can also make the whole 
summary table into one long string by concatenating the strings that represent the rows. 

EXAMPLE 


The preceding summary table can be made into a string of a’s and b’s as follows: 



One one-word code for the whole machine is 

ababababbabaaabaaabbaaabaaabaaabaaaabaabbaaba 

This is not the only one-word code for this machine because the order of the rows in the 
table is not rigid. We can standardize the code word by insisting that the row codes be amal¬ 
gamated in their lexicographic order. ■ 

It is also important to observe that we can look at such a long string and decode the TM 
from it, provided that the string is in the proper form, that is, as long as the string is a word 
in the code word Language (CWL). 

(For the moment, we shall not worry about the forbidden HALT-leaving strings. We 
consider them later.) 

CWL = the language defined by (a + ba + b(a + b) 5 )* 

ALGORITHM 


The way we decode a string in CWL is as follows: 
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Step 1 Count the initial clump of a’s and fill in that number in the first entry of the first 
empty row of the table. 

Step 2 Forget the next letter; it must be a b. 

Step 3 Count the next clump of a’ s and fill in that number in the second column of this 
row. 

Step 4 Skip the next letter; it is a b . 

Step 5 Read the next two letters. If they are aa, write an a in the Read box of the table. 
If they are ab, write a b in the table. If they are ba, write a A in the table. If they 
are bb, write a # in the table. 

Step 6 Repeat step 5 for the table Write entry. 

Step 7 If the next letter is an a, write an L in the fifth column of the table; otherwise, 
write an R. This fills in the Move box and completes the row. 

Step 8 Starting with a new line of the table, go back to step 1, operating on what re¬ 
mains of the string. If the string has been exhausted, stop. The summary table is 
complete. ■ 


EXAMPLE 


Consider the string 


abaaabaaaabaaabaaabaaaabaaabaabababa 


The first clump of as is one a. Write 1 in the first line of the table. Drop the b . The next 
part of the string is a clump of three a' s. Write 3 in row 1, column 2. Drop the b. Now aa 
stands for a . Write a in column 3. Again, aa stands for a . Write a in column 4. Then b stands 
for R. Write this in column 5, ending row 1. Starting again, we have a clump of three a' s so 
start row 2 by writing a 3 in column 1. Drop the b. Three more a' s, write a 3. Drop the b. 
Now aa stands for a; write it. Again, aa stands for a\ write it. Then b stands for R. Finish 
row 2 with this R. What is left is three a’s, drop the b, two a' s, drop the b , then ab, and ab, 
and a, meaning b, and b, and L. This becomes row 3 of the table. We have now exhausted the 
CWL word and have therefore finished a table. 

The table and machine are 

I From Tb Read Write Move 



The result of this encoding process is that every TM corresponds to a word in CWL. 
However, not all words in CWL correspond to a TM. There is a little problem here because 
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when we decode a CWL string, we might get an improper TM such as one that is nondeter- 
ministic or repetitive (two rows the same) or violates the HALT state, but this should not 
dull our enthusiasm for the code words. These problems will take care of themselves, as we 
shall see. 


$ A NON-RECURSIVELY ENUMERABLE LANGUAGE 

The code word for a TM contains all the information of the TM, yet it can be considered as 
merely a name—or worse yet, input. Because the code for every TM is a string of a's and 
b's, we might ask what happens if this string is run as input on the very TM it stands for. We 
shall feed each TM its own code word as input data. Sometimes it will crash, sometimes 
loop, sometimes accept. 

Let us define the language ALAN as follows. 

DEFINITION 

ALAN = {all the words in CWL that are not accepted by the TMs they 

represent or that do not represent any TM} ■ 

EXAMPLE 

Consider the TM 



The table for this machine is simply 


The code word for this TM is 

abaabababb 

But if we try to run this word on the TM as input, it will crash in state 1 because there is 
no edge for the letter a leaving state 1. 

Therefore, the word 

abaabababb 



is in the language ALAN. 
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EXAMPLE 


The words 


aababaaaaa 


and aaabaabaaaaa 


are in CWL but do not represent any TM, the first because it has an edge leaving HALT and 
the second because it has no START state. Both words are in ALAN. ■ 


EXAMPLE 

In one earlier example, we found the TM corresponding to the CWL word 

abaaabaaaabaaabaaabaaaabaaabaabababa 

When this word is run on the TM it represents, it is accepted. This word is not in 
ALAN. ■ 

EXAMPLE 

If a TM accepts all inputs, then its code word is not in ALAN. If a TM rejects all inputs, then 
its code word is in ALAN. Any TM that accepts the language of all strings with a double a 
will have a code word with a double a and so will accept its own code word. The code words 
for these TMs are not in ALAN. The TM we built in Chapter 19 to accept the language 
PALINDROME has a code word that is not a palindrome. Therefore, it does not accept its 
code word and its code word is in ALAN. ■ 

We shall now prove that the language ALAN is not recursively enumerable. We prove 
this by contradiction. Let us begin with the supposition that ALAN is r.e. In that case, there 
would be some TM that would accept all the words in ALAN. Let us call one such TM T. 
Let us denote the code word for T as code(7). Now we ask the question: 

Is code(r) a word in the language ALAN or not? 

There are clearly only two possibilities: yes or no. Let us work them out with the precision 
of Euclidean geometry. 


CASE 1: code(T) is in ALAN 


CLAIM 

REASON 

1. T accepts ALAN. 

1. Definition of T. 

2. ALAN contains no code 

2. Definition of ALAN. 

word that is accepted by the 


machine it represents. 


3. code(r) is in ALAN. 

3. Hypothesis. 

4. T accepts the word code(T). 

4. From 1 and 3. 

5. code(T) is not in ALAN. 

5. From 2 and 4. 

6. Contradiction. 

6. From 3 and 5. 

7. code(T) is not in ALAN. 

7. The hypothesis (3) must be 


wrong because it led to a 


contradiction. 


m 

*• 
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Again, let us use complete logical rigor. 


CASE 2: code(T) is not in ALAN 


CLAIM 

REASON 

1. T accepts ALAN. 

1. Definition of T. 

2. If a word is not accepted 

2. Definition of ALAN. 

by the machine it 


represents, it is in ALAN. 


3. code(T) is not in ALAN. 

3. Hypothesis. 

4. code(T) is not accepted by T. 

4. From 1 and 3. 

5. code(T) is in ALAN. 

5. From 2 and 4. 

6. Contradiction. 

6. From 3 and 5. 

7. code(F) is in ALAN. 

7. The hypothesis (3) must be 


wrong because it led to a 


contradiction. 


Both cases are impossible; therefore, the assumption that ALAN is accepted by some TM is 
untenable. ALAN is not recursively enumerable. 


THEOREM 64 

Not all languages are recursively enumerable. ■ 

This argument usually makes people’s heads spin. It is very much like the old “liar para¬ 
dox,” which dates back to the Megarians (attributed sometimes to Eubulides and sometimes 
to the Cretan Epimenides) and runs like this. A man says, “Right now, I am telling a lie.” If it 
is a lie, then he is telling the truth by confessing. If it is the truth, he must be lying because 
he claims he is. Again, both alternatives lead to contradictions. 

If someone comes up to us and says, “Right now, I am telling a lie,” we can walk away 
and pretend we did not hear anything. If someone says to us, “If God can do anything, he 
can make a stone so heavy that He cannot lift it,” we can bum him as a blaspheming heretic. 
If someone asks us, “In a certain city the barber shaves all those who do not shave them¬ 
selves and only those. Who shaves the barber?”, we can answer, “The barber is a woman.” 
However, here we have used this same old riddle not to annoy Uncle Charlie, but to provide 
a mathematically rigorous proof that there are languages that TMs cannot recognize. 

The liar paradox and other logical paradoxes are very important in computer theory, as 
we can see by the example of the language ALAN. In fact, the whole development of the 
computer came from the same kind of intellectual concern as was awakened by considera¬ 
tion of these paradoxes. 

The study of logic began with the Greeks (in particular, Aristotle and Zeno of Elea) but 
then lay dormant for millennia. The possibility of making logic a branch of mathematics be¬ 
gan in 1666 with a book by Gottfried Wilhelm von Leibniz, who was also the coinventor of 
calculus and an early computer man (see Chapter 1). His ideas were continued by George 
Boole in the nineteenth century. 

About a hundred years ago, Georg Cantor invented set theory and immediately a con¬ 
nection was found between set theory and logic. This allowed the paradoxes from logic, pre¬ 
viously a branch of philosophy, to creep into mathematics. That mathematics could contain 
paradoxes had formerly been an unthinkable situation. When logic was philosophical and 
rhetorical, the paradoxes were tolerated as indications of depth and subtlety. In mathematics, 
paradoxes are an anathema. After the invention of set theory, there was a flood of paradoxes 
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from Cesare Burali-Forti, Cantor himself, Bertrand Russell, Jules Richard, Julius Konig, and 
many other mathematical logicians. This made it necessary to be much more precise about 
which sentences do and which sentences do not describe meaningful mathematical opera¬ 
tions. This led to Hilbert’s question of the decidability of mathematics and then to the devel¬ 
opment of the theory of algorithms and to the work of Godel, Turing, Post, Church (whom 
we shall meet shortly), Kleene, and von Neumann, which in turn led to the computers we all 
know (and love). In the meantime, mathematical logic, from Gottlob Frege, Russell, and Al¬ 
fred North Whitehead on, has been strongly directed toward questions of decidability. 

The fact that the language ALAN is not recursively enumerable is not its only unusual 
feature. The language ALAN is defined in terms of TMs. It cannot be described to people 
who do not know what TMs are. It is quite possible that all the languages that can be thought 
of by people who do not know what TMs are are recursively enumerable. (This sounds like 
its own small paradox.) This is an important point because, since computers are (approxi¬ 
mate) TMs, and since our original goal was to build a universal algorithm machine, we want 
TMs to accept practically everything. Theorem 64 is definitely bad news. If we are hoping 
for an even more powerful machine to be defined in Part IV of this book that will accept all 
possible languages, we shall be disappointed for reasons soon to be discussed. 

THE UNIVERSAL TURING MACHINE 

The idea of encoding a TM program into a string of a’s and b 's to be fed into itself is poten¬ 
tially more profitable than we have yet appreciated. When a TM program is made into an in¬ 
put string, it may be fed into other TMs for other purposes. What we shall now design is a 
TM that can accept as input two strings separated by a marker, where the first string is the 
encoding of some TM program and the second string is data that our machine will operate 
on as if it were the TM described by the first input string. In other words, our new TM will 
simulate the running of the encoded TM on the data string. This is not a simulation in the 
sense of the proof of Theorem 61 (p. 538), where we designed a special TM to act as if it 
were two particular TMs operating simultaneously. There we built a very different 7 3 for 
each pair of starting machines 7, and T r What we shall construct here is one and only one, 
good for all time, TM that can imitate the action of any TM described to it on any arbitrary 
data string we choose. The states and edges of our TM will not vary, but it will, by referring 
to the half of the input that is the encoded TM program, mimic those operations on the other 
half of the input, the intended data string fed into the encoded machine. 

We might ask, “What is the advantage of such a thing?” If we want to see how TM T { 
acts on a particular input string, why not just feed 7, the input in person? Why bother to feed 
an encryption of 7j and the data into a second TM to run a simulation? There are many rea¬ 
sons for designing such a machine, and they will become evident shortly, but a computer sci¬ 
ence major should be ashamed of asking such a question when the answer is obvious. What 
we are building is a programmable TM. Instead of building a different computer for each 
possible program, we are building a computer that accepts a set of instructions (a program) 
and input data and acts on the data according to the instructions. 

Let us recapitulate the impetus for the invention of the computer. Hilbert asked for an 
algorithm that would generate a solution for any mathematical problem posed to it. The solu¬ 
tion could be either a simple numerical answer, a mathematical proof, or an algorithm for re¬ 
solving special classes of questions. In order to begin working on such an ambitious project, 
logicians began to design small instruction sets in which all mathematical problems could be 
stated, and from which all mathematical solutions could be composed. Godel constructed a 
mathematical statement that, if it were provable, would be false, but if it were not provable, 
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would be true. This meant that Hilbert’s abstract goal could not be reached in total, because 
the truth or provability of Godel’s statement would always remain unanswered. But it was 
possible that the trouble caused by Godel’s statement could be contained, and that the bulk 
of Hilbert’s ambition could somehow still be fulfilled. 

That was until the work of Turing. He introduced the universal algorithm machine that 
could execute any mathematical algorithm that could theoretically ever be designed. He used 
it to show that it had irreparable severe limitations; that is, there were mathematical prob¬ 
lems that simply could not be solved by any algorithm. This universal algorithm machine is 
the TM we have been describing (and will build) in this section, and the limitations just 
mentioned will be elucidated soon in terms of the TM language questions that arise naturally 
in their analogy to regular and context-free languages. 

Even though Turing’s universal machine was limited in theory, still it could execute all 
known algorithms and all algorithms discoverable in the future. Although not enough to sat¬ 
isfy Hilbert’s dream, this is still quite a feat. By fortunate accident, Turing’s model of a pro¬ 
grammable machine was so simple that soon after his theoretical paper was published, peo¬ 
ple began to build real physical models of what was originally intended as an abstract 
mathematical construct to settle (or scuttle) a project in pure mathematics. Electrical engi¬ 
neers had already been working on producing more and more sophisticated calculating de¬ 
vices, performing sequences of arithmetic operations, boosted by the speedy revolution in 
electronic technology that was simultaneously being developed with no apparent connection 
to the crisis in mathematical logic. 

Instead of having to build a different electronic device for each algorithm, Turing’s 
mathematical work showed how one universal machine would suffice to simulate all algo¬ 
rithms with a very restricted working set of instructions and memory capabilities. The math¬ 
ematical project was not completed until von Neumann (a star mathematician, logician, and 
engineer) showed how to actualize a programmable computer in which the instructions, be¬ 
cause they are fed in as data, could not only operate on the separate data field, but also could 
modify their own program as it was running. This allowed the writing of programs that could 
change their conditional branching instructions, evolve by writing new instructions for them¬ 
selves, and potentially learn from their experience on one data set to change what they do to 
another. This then was the final step in the theoretical foundation of what is a computer. In 
this text, we emphasize Turing’s contribution but pay little to von Neumann’s extension of it. 
That is only because we have to draw the line somewhere. 

DEFINITION 

A universal TM, a UTM, is a TM that can be fed as input a string composed of two parts: 
The first is the encoded program of any TM 7 followed by a marker, the second part is a 
string that will be called data. The operation of the UTM is that, no matter what machine 7 
is, and no matter what the data string is, the UTM will operate on the data as if it were 7. If 
7 would have crashed on this input, it will crash; if 7 would loop forever, it will loop for¬ 
ever; and if 7 would accept the input, the UTM does so too. Not only that but the UTM will 
leave on its Tape the encoded 7, the marker, and the contents of what 7 would leave on its 
Tape when it accepts this very input string. ■ 

We have been careful to imply that there does not exist only one unique UTM but per¬ 
haps many, depending on the choice of encoding algorithm for the machine 7 and the algo¬ 
rithm chosen for simulation. In the previous section, we encoded TMs into strings of a’s and 
h’s. It will be easier for us to describe the working of a UTM employing a different encoding 
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algorithm, one that is slightly less universal as it makes restrictions on the number of states 
the TM to be simulated can have and on the size of that TM’s Tape character set. Let us as¬ 
sume, for the time being, that the TM to be encoded has at most 1 million states 
q x = START, q 2 — HALT, q y q 4 , ... . Let us also assume that there are at most 1 million 
different characters that the TM T can ever employ on its Tape (including its input alphabet): 

^ 2 ’ ‘ ’ 4 ' 

We can now reduce every row of the tabular description of the TM T to a series of sylla¬ 
bles of the form q x ccjMq w , where M is either L or R. In order to be sure that no confusion 
arises, let us assume that none of the characters c is the same as any of the characters q and 
that neither of them is the same as L or R. Let us also assume that this character set does not 
contain our particular set of markers # and $. 

This is truly a limitation because UTMs are supposed to be able to handle the simula¬ 
tion of any T, not just one with under a million states and under a million characters. How¬ 
ever, these assumptions will have the advantage of simplifying the description of the UTM 
because the name of each state and each character is one symbol long, as opposed to the en¬ 
coding given in the previous section where there could be arbitrarily many states and charac¬ 
ters and their corresponding designations could increase in length enormously (unbound¬ 
edly). After we are finished designing our limited model, we will describe how it could be 
modified to run on the unrestricted encoding in the previous section. 

With this encoding scheme, every TM can be fully encoded into a word formed from the 
concatenation of finitely many syllables of the type described above. Every substring of two 
consecutive q 's necessarily denotes the break between two edge instructions in the TM T. 
Every substring of two consecutive c’s necessarily denotes a read and write section of an 
edge instruction and is necessarily followed by an L or R. To distinguish this encoding strat¬ 
egy from the one presented before, we call this encryption TM coding, TMC, and we desig¬ 
nate the TMC code word for the machine T as TMC T. 
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UTMs exist. 


PROOF 

Initially, the UTM Tape will contain the following: the cell i marker #, the TMC code for 
some TM T, the separator $, and the data field d v d v d y . . . made up of a finite string of 
characters from the alphabet {c, c 2 . . .}. 


# TMC T 


data AA 


This is the correct form of the input string into the UTM. We are not responsible for what may 
happen to an input string that is not in this precise form. The first state of the UTM is, of course, 
START. From there we go to a state searching for the first character of the data string. 
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(any non-$,=,H) 


We are now in a UTM state reading the first character of the data string. Instead, we in¬ 
sert the state we know the simulated machine T to be in at this moment, that is, its START 
state q r 


This marks the fact that T is in the state to the left of the UTM Tape Head and its own 
Tape Head is reading a cell whose contents are those the UTM Tape Head is now reading. 
Except for the q x , which we shall continue to employ as a T Tape Head indicator throughout 
the simulation, the data field of the UTM Tape will always be kept exactly the same as the 
whole TM T Tape. 

We are now ready to do our main iteration. Based on the state we know we are in on the 
simulation and the character we know we are reading in the simulation, we head for the ap¬ 
propriate one of the million squared possible combination states q x & c y . 




We shall now proceed as if we are farther along into our simulation and we have 
reached the situation of being in state q x on T and reading character c y on the T Tape. On the 
UTM we are in state q x & c y . Once we know that we are in such a situation, we wind the 
UTM Tape Head left until we cross the $, entering the TMC code for T, and we search there 
for the substring q x c y because this represents being in state q x on TM T and reading the char¬ 
acter c . At most, one such substring exists because T is deterministic. The following UTM 
code will accomplish this: 
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{ q„q„L) 


[c v c., r L) 


(any non-g* and non-c r =,L) 


(g^blue-g^i?) 


When we get to this state, we have found the correct TM T edge to take to simu¬ 
late the running of the T machine. We have marked its state by turning it blue. So we 
need a blue set of q 's as characters too. We mark it so that we can run up the T Tape 
simulation to the right of the $, do the writing, and still later return to this instruc¬ 
tion. What would happen if we ran down the whole UTM Tape to cell i and read the # 
without finding the substring we were looking for? The answer is that T would have no 
cyedge coming out of state q x and we would have to simulate a crash. We have our 
choice of ways for doing this so we leave the selection of this option up to the UTM pur¬ 
chaser. 

We must now simulate the operation of being in q x reading a c y on TM T. We must find 
what character T wants to convert the c y into. Then we must go to a state that remembers 
what that character is (there are a million of them, one for each possible character), run the 
Tape Head up the UTM Tape until it crosses the $ barrier, enters the T Tape simulation, 
finds the unique ^-symbol on this side of the $, and change the next cell from c y to this new 
character. 

We are not yet done with the simulation. We must now run back down the UTM Tape 
looking for the blue-*? to the left of the $ and find out how T wants its Tape Head moved and 
what jT-state it wants to enter next. Here, the UTM program is as follows. Un-blue the 
4 -state, skip the read field of the TMC T-edge, skip the write field, and branch on the Tape 
Head move field, and then branch again on the new state until we reach the appropriate one 
of the 2 million states,“L & q ” or & q z .” 


(blue-g-^g^it) 


(c ,c R) 

|. ,v r ^ | 

jO 

(any,=,J?) 


|: ' 
\ 


1 > | 

U 

1 -H 

<JL 


( 02 * 92 *®) 




(03.03.H) 


(q 2 ,q2>R) 


( 01 . 01 ,®) 


(03*03*®) 


When we are in this M- and q- state, we race back up the UTM Tape, past the $ marker, 
and up to where we read q x again. This time we DELETE it and INSERT the new < 7 , either 
two cells before or one cell after the cell we are in, depending on whether the simulation of 
T wanted the T Tape Head moved left or right. 
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(any non-$>=,®) (any not >q,~,R) 



After inserting q z , we branch on the character c w that we encounter in the cell after it to an 
appropriate q z & c w state. Then we move left down the Tape, searching for the substring q z c w 
and the whole process reiterates. 

The only way the UTM terminates execution is when the TMC T instruction is to move 
to state q z — q v which is the T HALT state. The UTM cannot quite halt yet itself because it 
still has a 4 -marker on the data side of the $. This marker is the only 4 -symbol on this half of 
the Tape. We run the Tape Head up, search, and destroy. Then we go to the UTM HALT. 

This UTM has a large Tape alphabet. A million c’s, a million black 4 ’s, a million blue 
4 ’s, an L, an R, a #, and a $. It also has more than a trillion states. But it does exactly what 
we want it to do. Without knowing what T is and what the data are (only knowing that the 
state names of T have been changed to 4 ’s, the character names on the T Tape have been 
changed to c’s, and that there are at most a million of each), it correctly simulates the opera¬ 
tion of the machine T on this data. 

We promised an explanation about what we should do to build a real UTM that accepted all 
CWL words of TMs with an unbounded number of states and an unbounded number of charac¬ 
ters. In this case, instead of simply having a state q & c, we need to mark the whole q and c field 
on the right side of the $ by making it blue and then crossing the $, moving left, and searching 
for an identical substring corresponding to the encoding of the same state and data. To mark an 
arbitrarily large substring of Tape cells and then search a specified range (between # and $) for 
the identical substring is not hard TM programming, and we could have proven this theorem 
that way. But the approach we took is slicker and more intuitive than a mess of non-mnemonic 
a’’ s and b' s. But once we have understood our machine, it is clear that UTMs do exist, not just 
that there are rumors of them having been sighted circling the skies in remote places. ■ 

By the way, aren’t there a great many similarities between a UTM and a computer? We 
could have made the analogy even closer. We could have numbered (i.e., addressed) the cells in 
memory and the cells in the program section by inserting fixed-length bit codes in front of 
them. We could have set aside some register space, especially including an instruction counter 
instead of blue paint to remember where we are in the program. Then we could have used an 
address bus and a data bus to turn the TM’s linear memory into random access memory. But all 
these are relatively minor variations. The basic work of simulating a varying set of instructions 
on arbitrary data by employing a fixed procedure was all worked out in the UTM by Turing. 

NOT ALL r.e. LANGUAGES ARE RECURSIVE 

Now that we have designed the UTM, we may use it to settle some questions about recur- 
' sively enumerable languages, which is what Turing did initially. 

We have already defined the language ALAN as all CWL words that are not accepted by 
the TMs they might represent. Let us now consider the other side of the coin. 
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DEFINITION 

Let MATHIS ON be the language of all CWL words that do represent TMs and are accepted 
by the very machines they represent. (Mathison was Turing’s middle name, so do not seek 
any further mathematical interpretation.) ■ 

THEOREM 66 

MATHISON is recursively enumerable. 

PROOF 

The TM that accepts MATHISON is very much like our UTM, but it has an initializing sub¬ 
program. We start with an input string and then convert the Tape to 



We now run the UTM program exactly as written above. If it ends in a HALT, then we 
know that the original input was accepted when run on the TM it represents. 

It is conceivable that some arbitrary input string that did not really represent a TM could 
somehow trick a UTM into accepting itself. In fact, it is easy to see how this might happen. The 
input might be the encoding of a nondeterministic TM and the UTM found a path to HALT 
without realizing the input was bogus. Alternately, the input might have some semblance of a 
TM code word but include a garbage subsequence that luckily did not get in the way of the 
UTM search for states and edges on its way to HALT. In order to avoid these cases, we need a 
prescreening subprogram to check the input string to be sure that it is in the correct form of a de¬ 
terministic TM. Because CWL is a regular language, we know there is a TM that accepts it 
(Theorem 46, p. 445) and then all that need be checked further is the existence of moves out of 
the HALT state and the possibility of nondeterministic branching—all of which is elementary 
TM programming and, hence, so trivial for us that we need not bother making a further issue <■ 
it. 

Once we know that the input is, in fact, a code word for a TM, the procedure above will 
halt when and only when the input is a word in MATHISON. ■ 

THEOREM 67 

The complement of a recursively enumerable language might not be recursively enumerable. 

PROOF 

Because CWL is a regular language, its complement CWL' is also regular. Because CWL' is 
regular, it is also recursively enumerable. The union of CWL' and MATHISON is therefore 
the union of two r.e. languages and so is r.e. itself. Call this language L. 
L = CWL' 4- MATHISON. L is r.e., but its complement is ALAN that is not r.e.: 
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THEOREM 68 

There are recursively enumerable languages that are not recursive. 

PROOF 

The language L just defined is not recursive because that would mean ALAN = V would be 
r.e., which by p. 551 it is not. 

4“ DECIDABILITY 

We have answered some of the usual questions about languages for the class of r.e. lan¬ 
guages, and some others will be answered in the next chapter. What we face now is the ques¬ 
tion of membership for a language defined by a TM. 

Suppose we are given an input string w and a TM T. Can we tell whether or not T halts 
on w? This is called the halting problem for TMs. If the answer were “yes,” this question 
probably would not have a name, merely a theorem number. We shall indeed prove that there 
is no such decision procedure in our idiosyncratic sense of that term. 

To the suggestion, “Why don’t we just run w on T and see what happens?”, the answer 
is that this proposal might work, T might halt or crash while we are watching, or it might 
keep on running for a long time. It may run so long that we begin to suspect that w is in 
loop(7), but suspecting so does not make it so. T might run for seven years and then decide 
to accept w. 

Because we have been claiming that TMs can execute any mathematical algorithm, 
what we would expect to find as a halting problem decision procedure is a special TM. 
Into this special machine we place w and T (encoded, of course) and out comes the an¬ 
swer of whether T accepts w. The UTM is not our solution because all that will do is sim¬ 
ulate T\ we need something better. The hope of converting T itself into a machine that 
never loops is doomed because if we could always do that for any TM, all recursively 
enumerable languages would be recursive, which we know they are not. So, what then is 
the answer? 


THEOREM 69 

There is no TM that can accept any string w and any coded TM T and always decide 
correctly whether T halts on w. In other words, the halting problem cannot be decided by 
aTM. 
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PROOF 

Suppose for a moment that there was a TM that answers the halting problem. Let us call this 
machine HP. If we feed HP the CWL code for any TM T and then a # followed by any input 
string w , HP will, in finite time, halt itself and print out “yes” somewhere on its Tape if T 
halts on w and “no” if it does not. 

Let us modify HP as follows. Let us make it loop forever if it were about to print “yes” 
and halt. We could do this by taking whatever section of the program was about to print the 
final s and make it loop instead. For those pairs of inputs for which it was going to print 
“no,” we make no modification. 

Now we stick a subprogram, acting as a preprocessor, onto the front of the HP program. 
This preprocessor takes the left-of-# part of the input string and decides whether it is a word 
in CWL. If the input is not, the preprocessor crashes. If it is, then the preprocessor deletes 
the w part of the original input and puts two copies of the same string onto the Tape, sepa¬ 
rated by a #, and feeds into the main HP program. This means that the HP is going to ana¬ 
lyze whether the code word that gets past the preprocessor is an encoded TM that accepts its 
own code word as an input. If the answer is “yes”, then the modified machine loops forever. 
If the answer is “no,” then it prints “no” and halts. In other words, regardless of what slan¬ 
ders are printed on the Tape, this modified HP halts only on those inputs that are code words 
of TMs which do not accept their own code word as input. Therefore, this modified HP ac¬ 
cepts exactly the language ALAN. But ALAN is not r.e. This contradiction disproves the as¬ 
sumption that there exists a TM to decide the halting problem. ■ 

As if this situation were not bad enough, even more is true. 

THEOREM 70 

There is no TM that can decide, for every TM T, fed into it in encoded form, whether or not 
T accepts the word A. 

PROOF 

Suppose, for a moment, there was such a machine called LAMBDA. That is, for all TMs T , 
when we feed the code for T into LAMBDA, it prints out “yes” if A is accepted by T and 
“no” if A is not. We shall now prove that such a machine cannot exist by demonstrating how, 
by employing it, we could answer the halting problem by building a successful machine HP. 

We can build HP in this fashion. HP, remember, is fed an encoded TM program for T and 
a word w and is asked to decide whether T halts on w. The first thing that HP will do is create 
a new TM, in encoded form, out of T and w. Basically, what it will do is modify T by attach¬ 
ing a subprogram preprocessor that writes w on an empty Tape. This new TM 
(preprocessor + T) will be called T*. HP does not write the word w anywhere, nor does it run 
the machine T. What it does is take the letters of w = w, w 2 w 3 . . . and automatically con¬ 
struct a set of new TM states, connected in a line with edges labeled (A, w,, R ), (A, w 2 , R), 
(A, w v /?),.... This then is the preprocessor subprogram. HP now encodes the preproces¬ 
sor and concatenates it with the code it was given for T to obtain the code word for T*. 

With T* constructed like this, it is clear that the only word T* can possibly accept is A, 
because all other inputs would crash in the preprocessor stage. Not only that, but T* can only 
accept A if after w is put on the Tape and the machine runs like T, then T accepts w. In fact, 
T* accepts A if and only if T accepts w. 

Now this clever old HP has, by modifying the code of T into the code for T*, reduced 
the question it was supposed to answer into a question the machine LAMBDA can answer. 
So, the next section of the HP program is to act like LAMBDA on the code for I*. This will 
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print out “yes” or “no,” whichever is the truth about A for T*, which will also be the answer 
for w and T. Therefore, if LAMBDA exists, then HP exists. But HP does not exist. ■ 

So, not only can we not determine whether T accepts a given arbitrary word w, we can¬ 
not even tell whether when started on an empty Tape (i.e., the input A), it will halt. This is 
sometimes called the blank tape problem, and it too is unsolvable by TM. 

Given how little success we are having deciding things about TMs by TM, the next re¬ 
sult should be no surprise. 

THEOREM 71 

There is no TM that, when fed the code word for an arbitrary TM, can always decide 
whether the encoded TM accepts any words. In other words, the emptiness question for r.e. 
languages cannot be decided by TM. 

PROOF 

We shall prove this by a method analogous to that used in the last proof. We shall assume 
that there is such a TM, call it NOTEMPTY, that can decide whether the language for 
any TM, T*, fed into it can accept any words and prints out “yes” or “no” accordingly, 
and from this TM NOTEMPTY, we shall be able to construct a working model of 
LAMBDA. Because LAMBDA cannot exist, we can conclude that NOTEMPTY cannot 

exist either. . 

We can build LAMBDA in the following way. Let us say that LAMBDA is fed the 

encoded TM T and asked whether it halts on a blank Tape. What LAMBDA does is attach 
to T a preprocessor subprogram that erases any input that happens to be on the Tape. This 
preprocessor is essentially the loop (any non-A, A, R). It is important that it only erase 
the input (the non-A part of the Tape) and not loop forever. It now leaves the Tape Head 
in cell i. Now when it has finished attaching this preprocessor to T, it determines the new 
code word for the joint machine called T* and feeds this into NOTEMPTY. If the lan¬ 
guage of T* is not empty, this means that T* accepts some words. In the operation of T*, 
these words would first be erased and then T run on the blank Tape that remains. In other 
words, if T* accepts anything, then T accepts A. And if T accepts A, then T* accepts 
everything. So, the LAMBDA machine can be built from the NOTEMPTY machine, if 
the latter existed. * 

The construction in the proof of the last machine actually said that LAMBDA exists if 
there is a TM that can determine whether the language accepted by a given TM is infinite, 
because the language of T* is empty or infinite depending on whether T accepts A. Because 
LAMBDA does not exist, the machine to decide finiteness also cannot exist. Thus, we have 
actually proven this result. 

THEOREM 72 

There does not exist a TM that can decide, for any encoded TM T fed into it, whether or not 
the language of T is finite or infinite. ® 

We have been careful in the last three theorems to say that membership, A, and empti¬ 
ness are all not decidable by a TM. We did not have the nerve yet to claim that these ques¬ 
tions could not be decided by any means. That time, however, is approaching. 
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# PROBLEMS 

1. Show that each of the following languages is recursive by finding a TM that accepts 
them and crashes on all strings in their complement: 

(i) EVEN-EVEN 

(ii) EQUAL 

(iii) ODDPALINDROME 

(iv) TRAILINGCOUNT 

(v) MOREA 

Consider the following TMs for Problems 2 through 4: 

( a,a,R) 

t, 


2. What are accept^,), loop(T,), and reject^,)? Be careful about the word b. 

3. What are accept(T 2 ), loop(T 2 ), and reject(T 2 )? 

4. Draw the TM that accepts the language 

accept(T,) + accepter,) 

5. Trace the execution of these input strings on the machine of Problem 4: 

(i) A 

(ii) b 

(iii) aab 

(iv) ab 

6. Prove that all regular languages are recursive. 

7. Prove that all CFLs are recursive. 

8. Prove that if L, M, and N are three r.e. languages such that no two have a word in com¬ 
mon yet their union is all possible strings, then they are all recursive. 

9. Let L be a language and L' its complement. Prove that one of the following cases must 
be true: 

(i) Both L and L' are recursive. 

(ii) Neither L nor V is r.e. 

(iii) One is r.e. but not recursive while the other is not r.e. 

10. (i) Prove that the union of two recursive languages is recursive. 

(ii) Prove that the intersection of recursive languages is recursive. 

11. Suppose that L is r.e. but not recursive and that T accepts L. Prove that loop(7) is infinite. 

12. Using nondeterministic TMs, show that the product and Kleene closure of r.e. languages 
are r.e. 

13. Convert the following TMs first into summary tables and then into their code words in 
CWL. What are the six languages accepted by these TMs? 


Q HALT ^ 
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Run each of the six encoded words on their respective machines to see which are in the lan¬ 
guage ALAN. 

14. Can the code word for any TM be a palindrome? Prove your answer. 

15. Decode the following words from CWL into their corresponding TMs and run them 
on their corresponding TMs to see which are in ALAN and which are in MATHI- 
SON: 

(i) abaabbbbab 

(ii) abaabaabba 

(iii) abaabaabbb 

(iv) abaaabaaabbaaabaababbbb 

(v) abaaabaaabaaaabaababbab 

(vi) ababababab 

16. Outline a TM that accepts only CWL words that actually are encoded TMs. 

17. In Chapter 11 (just before Theorem 17), the blue paint method was presented to deter¬ 
mine whether an FA accepts any words at all. Using the TM depicted below, show that 
this method fails to decide whether a TM accepts any words: 

( --n (any,6,ft) / N (any,=,L) S X (a,a,R) s -v 

START J -- ■ > f J - — - > ( J -—>4 HALT J 

18. Given a TM, T v and any string w, there is clearly a TM, T 2 , that first screens its input to 
see whether it is the particular string w; if it is not the input is accepted, if it is w, then T, 
is run on the input w. Pictorially, 



Show that there is no decision procedure to determine whether any given TM (say T 2 ) r? ; 

accepts all strings or not. 1 

/J, 

19. Show that there is no TM that can decide, given code(T,) and code(T 2 ), whether \ 

accept(T,) = accept^). Hint : Choose a T 2 such that this problem reduces to the J 

ACCEPTALL machine of the previous problem. ■* ■-1 

20. (Oddly enough, this problem has nothing to do with computer theory, yet it has every¬ 
thing to do with the contents of this chapter.) 

In the English language, we can observe that some adjectives apply to themselves. . J 

For example, the word “short” is a fairly short word. We might say, “short” is short. J 

Also, the adjective “polysyllabic” is indeed polysyllabic. Some other possible adjectives | 

of this type are “unfrequent,” “melodious,” “arcane,” “unhyphenated,” “English,” “non- ’ 
palindromic,” and “harmless ” Let us call all these adjectives that describe themselves f 

homothetic. Let us call all other adjectives (those that do not describe themselves) het- Jj 

erothetic. For example, the words “gymnastic,” “myopic,” and “recursive” are all het- , 1 
erothetic adjectives. The word “heterothetic” is an adjective and therefore like all adjec- j 

fives it is either homothetic or heterothetic. Which is it? 


CHAPTER 24 


The Chomsky 
Hierarchy 


PHRASE-STRUCTURE GRAMMARS 

We have not yet developed all the information presented in the table at the beginning 
of Chapter 19. For one thing, we have not discovered the language structures that 
define recursively enumerable sets independent of the concept of TMs. This we shall do 
now. 

Why are context-free languages called “context-free”? The answer is that if there is a 
production A—»r, where A is a nonterminal and t is a terminal, then the replacement of t 
for N can be made in any situation in any working string. This gave us the uncomfortable 
problem of the itchy itchy itchy bear in Chapter 12. It could give us even worse prob¬ 
lems. 

As an example, we could say that in English the word “base” can mean cowardly, 
whereas “ball” can mean a dance. If we employ the CFG model, we could introduce the pro¬ 
ductions 


Base —> cowardly 
Ball —» dance 

and we could modify some working string as follows: 

Baseball => cowardly dance 

What is wrong here is that although base can sometimes mean cowardly, it does not al¬ 
ways have that option. In general, we have many synonyms for any English word; each is a 
possibility for substitution: 

Base —»foundation | alkali | headquarters J safety station | cowardly | mean 

However, it is not true in English that base can be replaced by any one of these words in 
each of the sentences in which it occurs. What matters is the context of the phrase in which 
the word appears. English is therefore not an example of a CFL. This is true even though, as 
we saw in Chapter 12, the model for context-free languages was originally abstracted from 
human language grammars. Still, in English we need more information before proceeding 
with a substitution. This information can be in the form of the knowledge of the adjoining 
words. 


11 
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Run each of the six encoded words on their respective machines to see which are in the lan¬ 
guage ALAN. 

14. Can the code word for any TM be a palindrome? Prove your answer. 

15. Decode the following words from CWL into their corresponding TMs and run them 
on their corresponding TMs to see which are in ALAN and which are in MATHI- 
SON: 

(i) abaabbbhab 

(ii) abaabaabba 

(iii) abaabaabbb 

(iv) abaaabaaabbaaabaababbbb 

(v) abaaabaaabaaaabaababbab 

(vi) ababababab 

16. Outline a TM that accepts only CWL words that actually are encoded TMs. 

17. In Chapter 11 (just before Theorem 17), the blue paint method was presented to deter¬ 
mine whether an FA accepts any words at all. Using the TM depicted below, show that 
this method fails to decide whether a TM accepts any words: 

/ -«v (any.&JR) / X (any,=,L) / \ (a,a,R) s -v 

( START V—-3*/ J - — J - — HALT J 

18. Given a TM, T v and any string w, there is clearly a TM, T 2 , that first screens its input to 
see whether it is the particular string w; if it is not the input is accepted, if it is w, then 7\ 
is run on the input w. Pictorially, 



Show that there is no decision procedure to determine whether any given TM (say T 2 ) 
accepts all strings or not. 

19. Show that there is no TM that can decide, given code(T 1 ) and code(7’ 2 ), whether 
accept(r,) = accept(T 2 ). Hint: Choose a T 2 such that this problem reduces to the 
ACCEPTALL machine of the previous problem. 

20. (Oddly enough, this problem has nothing to do with computer theory, yet it has every¬ 
thing to do with the contents of this chapter.) 

In the English language, we can observe that some adjectives apply to themselves. 
For example, the word “short” is a fairly short word. We might say, “short” is short. 
Also, the adjective “polysyllabic” is indeed polysyllabic. Some other possible adjectives 
of this type are “unfrequent,” “melodious,” “arcane,” “unhyphenated,” “English,” “non- 
palindromic,” and “harmless.” Let us call all these adjectives that describe themselves 
homothetic. Let us call all other adjectives (those that do not describe themselves) het- 
erothetic. For example, the words “gymnastic,” “myopic,” and “recursive” are all het- 
erothetic adjectives. The word “heterothetic” is an adjective and therefore like all adjec¬ 
tives it is either homothetic or heterothetic. Which is it? 
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^ PHRASE-STRUCTURE GRAMMARS 

We have not yet developed all the information presented in the table at the beginning 
of Chapter 19. For one thing, we have not discovered the language structures that 
define recursively enumerable sets independent of the concept of TMs. This we shall do 
now. 

Why are context-free languages called “context-free”? The answer is that if there is a 
production A—> r, where A is a nonterminal and t is a terminal, then the replacement of t 
for A can be made in any situation in any working string. This gave us the uncomfortable 
problem of the itchy itchy itchy bear in Chapter 12. It could give us even worse prob¬ 
lems. 

As an example, we could say that in English the word “base” can mean cowardly, 
whereas “ball” can mean a dance. If we employ the CFG model, we could introduce the pro¬ 
ductions 


Base —» cowardly 
Ball —> dance 

and we could modify some working string as follows: 

Baseball => cowardly dance 

What is wrong here is that although base can sometimes mean cowardly, it does not al¬ 
ways have that option. In general, we have many synonyms for any English word; each is a 
possibility for substitution: 

Base —► foundation | alkali | headquarters | safety station | cowardly | mean 

However, it is not true in English that base can be replaced by any one of these words in 
each of the sentences in which it occurs. What matters is the context of the phrase in which 
the word appears. English is therefore not an example of a CFL. This is true even though, as 
we saw in Chapter 12, the model for context-free languages was originally abstracted from 
human language grammars. Still, in English we need more information before proceeding 
with a substitution. This information can be in the form of the knowledge of the adjoining 
words. 
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Base line —> starting point 

Base metal —> nonprecious metal 

Way off base —> very mistaken | far from home 

Here, we are making use of some of the context in which the word sits to know which 
substitutions are allowed, where by context we mean the immediately adjoining words in the 
sentence. The term context could mean other things, such as the general topic of the para¬ 
graph in which the phrase sits; however, for us context means some number of the surround¬ 
ing words. 

Instead of replacing one character by a string of characters as in CFGs, we are now con¬ 
sidering replacing one whole string of characters (terminals and nonterminals) by another. 
This is a new kind of production and it gives us a new kind of grammar. We carry over all 
the terminology from CFGs such as “working string” and “the language generated” The 
only change is in the form of the productions. We are developing a new mathematical model 
that more accurately describes the possible substitutions occurring in English and other hu¬ 
man languages. There is also a useful connection to computer theory, as we shall see. 

DEFINITION 

A phrase-structure grammar is a collection of three things: 

1. A finite alphabet 2 of letters called terminals. 

2. A finite set of symbols called nonterminals that includes the start symbol S. 

3. A finite list of productions of the form 

String 1 —* string 2 

where string l can be any string of terminals and nonterminals that contains at least one 
nonterminal and where string 2 is any string of terminals and nonterminals whatsoever. 

A derivation in a phrase-structure grammar is a series of working strings beginning 
with the start symbol 5, which, by making substitutions according to the productions, arrives 
at a string of all terminals, at which point generation must stop. 

The language generated by a phrase-structure grammar is the set of all strings of termi¬ 
nals that can be derived starting at 5. ■ 

EXAMPLE 

The following is a phrase-structure grammar over 2 = {a b } with nonterminals X and 5: 

Prod 1 S—»XS | A 
Prod 2 X—*aX | a 
Prod 3 aaaX —* ba 

This is an odd set of rules. The first production says that we can start with S and derive 
any number of symbols of type X —for example, 

5 =>XS 
=»XXS 

=>xxxs 

=> XXXXS 

=>xxxx 


Phrase-Structure Grammars 


567 



The second production shows us that each X can be any string of a' s (with at least one a): 

X=>aX 
=>aaX 
=$aaaX 
=> aaaaX 
=>aaaaa 

The third production says that any time we find three <a’s and an X, we can replace these 
four symbols with the two-terminal string ba. 

The following is a summary of one possible derivation in this grammar: 

s => xxxxxx 

==> aaaaaXXXXX (after X =^> aaaaa ) 

=> aabaXXXX (by Prod 3) 

=> aabaaaXXX (after X aa) 

=> aabbaXX (Prod 3) 

«j< 

=> aabhaaaX (after X ==> aa) 

=>aabbba (after Prod 3) ■ 

This is certainly a horse of a different color. The algorithms that we used for CFGs must 
now be thrown out the window. Chomsky Normal Form is out. Sometimes, applying a pro¬ 
duction that is not a A-production still makes a working string get shorter. Terminals that 
used to be in a working string can disappear. Leftmost derivations do not always exist. The 
CYK algorithm does not apply. It is no longer possible just to read the list of nonterminals 
off of the left sides of productions. We cannot tell the terminals from the nonterminals with¬ 
out a scorecard. 

All CFGs are phrase-structure grammars in which we restrict ourselves as to what we 
put on the left side of productions. So, all CFLs can be generated by phrase-structure gram¬ 
mars. Can any other languages be generated by them? 


THEOREM 73 

At least one language that cannot be generated by a CFG can be generated by a phrase-struc¬ 
ture grammar. 


PROOF 

To prove this assertion by constructive methods, we need only demonstrate one actual lan¬ 
guage with this property. A nonconstructive proof might be to show that the assumption 

Phrase-structure grammar = CFG 

leads to some devious contradiction but, as usual, we shall employ the preferred constructive 
approach here. (Theorem 64 on p. 551 was proved by devious contradiction and see what be¬ 
came of that.) 

Consider the following phrase-structure grammar over the alphabet 2 = I a b): 

f Prod 1 S-*aSBA 

Prod 2 S—>abA 
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H 

Prod 

3 

AB - 

+BA 

| 

Prod 

4 

bB- 

+ bb 


Prod 

5 

bA- 

-*ba 


Prod 

6 

aA - 

+aa 


We shall show that the language generated by this grammar is \a n b n a n ), which we have 
shown in Chapter 16 is non-context-free. 

First, let us see one example of a derivation in this grammar: 


=s>aSBA 

Prod 

1 

—^ aaSBABA 

Prod 

1 

=> aaaSBABABA 

Prod 

1 

==» aaaabABABABA 

Prod 

2 

=> aaaabBAABABA 

Prod 

3 

==> aaaabBABAABA 

Prod 

3 

=> aaaabBBAAABA 

Prod 

3 

=> aaaabBBAABAA 

Prod 

3 

==» aaaabBBABAAA 

Prod 

3 

=> aaaabBBBAAAA 

Prod 

3 

=> aaaabbBBAAAA 

Prod 

4 

=> aaaabbbBAAAA 

Prod 

4 

=> aaaabbbbAAAA 

Prod 

4 

=> aaaabbbbaAAA 

Prod 

5 

=>aaaabbbbaaAA 

Prod 

6 

=>aaaabbbbaaaA 

Prod 

6 

==> aaaabbbbaaaa 

Prod 

6 


= a 4 b 4 a 4 

To generate the word cTlf'cT for some fixed number m (we have used n to mean any 
power in the defining symbol for this language), we could proceed as follows. 

First, we use Prod 1 exactly (m — 1) times. This gives us the working string 


m— 1 (m — 1) ZTs alternating 

with 

(m — 1) A’s 

Next, we apply Prod 2 once. This gives us the working string 
aa ... a b ABAB ... BA 


m m A’s 

m — 1 B’s 

Now we apply Prod 3 enough times to move the B’ s in front of the A’s. Note that we 
should not let our mathematical background fool us into thinking that AB—*BA means that the 
A’s and B' s commute. No. We cannot replace BA with AB —only the other way around. The 
A’s can move to the right through the B’ s. The B’s can move to the left through the A’s. We can 
only separate them into the arrangement B’ s, then A’s. We then obtain the working string 


m 


m- 1 


m 
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Now using Prods 4, 5, and 6, we can move left through the working string, converting 
B’ s to b’s and then A’s to a’s. 

We will finally obtain 

aa ... a bb ... b aa ... a = a m b m a m 

V_ > V- __ _ S * s-_. ^ 

m m m 

We have not yet proven that ( a n b n a n } is the language generated by the original grammar, 
only that all such words can be derived. To finish the proof, we must show that no word not 
in [a n b n a n } can be generated. We must show that every word that is generated is of the form 
a n b n a n for some n. 

Let us consider some unknown derivation in this phrase-structure grammar. We begin 
with the start symbol S and we must immediately apply either Prod 1 or Prod 2. If we start 
with Prod 2, the only word we can generate is aba , which is of the approved form. 

If we begin with Prod 1, we get the working string 

a SB A 

which is of the form 

I_I S I_1 

some a’s equal A’s and B ’s 

The only productions we can apply are Prods 1, 2, and 3, because we do not yet have any 
substrings of the form bB, bA , or aA. Prods 1 and 3 leave the form just as above, whereas 
once we use Prod 2, we immediately obtain a working string of the form 

I_i abA !_I 

a’s equal A’s and B’s 

If we never apply Prod 2, we never remove the character S from the working string and 
therefore we never obtain a word. Prod 2 can be applied only one time, because there is 
never more than one S in the working string. 

Therefore, in every derivation before we have applied Prod 2, we have applied some 
(maybe none) Prod 1 ’s and Prod 3’s. Let the number of Prod 1 ’s we have applied be m. We 
shall now demonstrate that the final word generated must be 

a m+l b m+l am+ l 

Right after Prod 2 is applied, the working string looks like this: 

1_I abA 1_I 

exactly m a’s exactly m A’s 

and m B’s 
in some order 

The only productions we can apply now are Prods 3, 4, 5, and 6. Let us look at the 
working string this way: 

a m+l b Nonterminals 
(m + 1) A’s 
m B’s 

Any time we apply Prod 3, we are just scrambling the right half of the string, the se¬ 
quence of nonterminals. When we apply Prod 4, 5, or 6, we are converting a nonterminal 
into a terminal, but it must be the nonterminal on the border between the left-side terminal 
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string and the right-side nonterminal string. We always keep the shape 

terminals Nonterminals 

(just as with leftmost Chomsky derivations), until we have all terminals. The A' s eventually 
become a' s and the B’s eventually become b’s. However, none of the rules for Prods 4, 5, or 
6 can create the substring ab. We can create bb, ba, or aa, but never ab. From this point on, 
the pool of A* s and B’s will be converted into as and b' s without the substring ab. That 
means it must eventually assume the form b*a*. 

a m+l b Nonterminals 
(m+ 1)/Fs 
m B’s 


must become 


a m + \ b h m 0 *+! 


which is what we wanted to prove. ■ 

As with CFGs, it is possible to define and construct a total language tree for a phrase- 
structure grammar. To every node, we apply as many productions as we can along different 
branches. Some branches lead to words; some may not. The total language tree for a phrase- 
structure language may have very short words way out on very long branches (which is not 
the case with CFLs). This is because productions can sometimes shorten the working string, 
as in the example 

X^aX 


The derivation for the word ab is 


aaaaaaX —* b 


>aX 

*aaX 

> aaaX 

> aaaaX 

>aaaaaX 

> aaaaaaX 

>aaaaaaaX 
>ab 


EXAMPLE 

The total language tree for the phrase-structure grammar for { a n b n a n f on p. 567 begins 


aabABA aba 


aaaSBABABA aaabABABA aaSBBAA aabBAA aabaBA 


(dead end) 



Phrase-Structure Grammars 


571 


Notice one interesting thing that can happen in a phrase-structure grammar. A working 
string may contain nonterminals and yet no production can be applied to it. Such a working 
string is not a word in the language of the grammar; it is a dead end. ■ 

The phrase-structure languages (those languages generated by phrase-structure gram¬ 
mars) are a larger class of languages than the CFLs. This is fine with us, because CFGs are 
inadequate to describe all the languages accepted by TMs. 

We found that the languages accepted by FAs are also those definable by regular expres¬ 
sions and that the languages accepted by PDAs are also those definable by CFGs. What we need 
now is some method of defining the languages accepted by TMs that does not make reference to 
the machines themselves (simply calling them recursively enumerable contributes nothing to 
our understanding). Perhaps phrase-structure languages are what we need. (Good guess.) Also, 
because we already know that some languages cannot be accepted by TMs, perhaps we can find 
a method of defining all possible languages, not just the r.e. languages. Although we have 
placed very minimal restrictions on the shape of their productions, phrase-structure grammars 
do not have to be totally unstructured, as we see from the following result. 

THEOREM 74 

If we have a phrase-structure grammar that generates the language L, then there is another 
grammar that also generates L which has the same alphabet of terminals and in which each 
production is of the form 

string of nonterminals —* string of terminals and nonterminals 
(where the left side cannot be A, but the right side can). 


PROOF 

This proof will be by constructive algorithm using the same trick as in the proof of Theorem 25. 

Step 1 For each terminal a, b, . . . introduce a new nonterminal (one not used be¬ 
fore): A, B, . . . and change every string of terminals and nonterminals into a 
string of nonterminals above by using the new symbols. For example, 


aSbXb —»bbXYX 


becomes 


ASBXB —* BBXYX 


Step 2 Add the new productions 


A—* a 
B^b 


These replacements and additions obviously generate the same language and fit the de¬ 
sired description. In fact, the new grammar fits a stronger requirement. Every production is 
either 

string of nonterminals —* string of nonterminals 


one nonterminal —> one terminal 


(where the right side can be A, but not the left side). 
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EXAMPLE 

The phrase-structure grammar over the alphabet [a b), which generates [a n b n a n \, which 
we saw above, 

S~*aSBA 

S~*abA 

AB-+BA 

bB^bb 

bA^>ba 

aA—*aa 

turns into the following, when the algorithm of Theorem 74 is applied to it: 


Notice that we had to choose new symbols, X and Y, because A and B were already be¬ 
ing employed as nonterminals. 


DEFINITION 

A phrase-structure grammar is called type 0 if each production is of the form 

nonempty string of nonterminals —> any string of terminals and nonterminals ■ 

The second grammar above is type 0. Actually, what we have shown by Theorem 74 is 
that all phrase-structure grammars are equivalent to type 0 grammars in the sense that they 
generate the same languages. 

Some authors define type 0 grammars by exactly the same definition as we gave for 
phrase-structure grammars. Now that we have proven Theorem 74, we may join the others 
and use the two terms interchangeably, forgetting our original definition of type 0 as distinct 
from phrase-structure. As usual, the literature on this subject contains even more terms for 
the same grammars, such as unrestricted grammars and semi-Thue grammars. 

Beware of the sloppy definition that says type 0 includes all productions of the form 

any string —> any string 

because that would allow one string of terminals (on the left) to be replaced by some 
other string (on the right). This goes against the philosophy of what a terminal is, and we 
do not allow it. Nor do we allow frightening productions of the form A —* something, 
which could cause letters to pop into words indiscriminately (see Gen, 1:3 for 
“A-> light”). 

Names such as nonterminal-rewriting grammars and context-sensitive-with-erasing 
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grammars also turn out to generate the same languages as type 0. These names reflect other 
nuances of formal language theory into which we do not delve. 

One last remark about the name type 0. It is not pronounced like the universal blood 
donor but rather as “type zero.” The 0 is a number, and there are other numbered types. 

Type 0 is one of the four classes of grammars that Chomsky, in 1959, cataloged in a hi¬ 
erarchy of grammars according to the structure of their productions. 


The Chomsky Hierarchy of Grammars 


Type 

Name of 

Languages 

Generated 

Production Restrictions 
X^>Y 

Acceptor 

0 

Phrase-structure 
= recursively 
enumerable 

X = any string with nonterminals 

Y = any string 

TM 

1 

Context- 

sensitive 

X = any string with nonterminals 

Y = any string as long as or 
longer than X 

TMs with bounded (not infinite) 
Tape, called linear-bounded 
automata LBAs* 

2 

Context-free 

X — one nonterminal 

Y = any string 

PDA 

3 

Regular 

X = one nonterminal 

Y = t N or Y = t, where 
t is terminal and 

N is nonterminal 

FA 


*The size of the tape is a linear function of the length of the input, cf. problem 20. 


We have not yet proven all the claims on this table, nor shall we. We have completely 
covered the cases of type 2 and type 3 grammars. Type 1 grammars are called context- 
sensitive because they use some information about the context of a nonterminal before al¬ 
lowing a substitution. However, they require that no production shorten the length of the 
working string, which enables us to use the top-down parsing techniques discussed in 
Chapter 18. Because they are very specialized, we treat them only briefly (cf. p. 588). In 
this chapter, we prove the theorem that type 0 grammars generate all recursively enumer¬ 
able languages. 

Two interesting languages are not on this chart. The set of all languages that can be ac¬ 
cepted by deterministic PDAs, called simply the deterministic context-free languages. We 
have seen that they are closed under complementation, which makes more questions decid¬ 
able. They are generated by what are called LR(k) grammars, which are grammars generat¬ 
ing words that can be parsed by being read from left to right, taking k symbols at a time. 
This is a topic of special interest to compiler designers. This book is only an introduction 
and does not begin to exhaust the range of what a computer scientist needs to know about 
theory to be a competent practitioner. 

The other interesting class of languages that is missing is the collection of recursive lan¬ 
guages. No algorithm can, by looking only at the structure of the grammar, tell whether the 
language it generates is recursive—not counting the symbols, not describing the production 
strings, nothing. 

These six classes of languages form a nested set as shown in this Venn 
diagram: 
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We have discussed most of the examples that show no two of these categories are re¬ 
ally the same. This is important—just because a condition looks more restrictive does not 
mean it actually is in the sense that different languages fulfill it. Remember that FA = NFA. 

{ a n b n } is deterministic context-free but not regular. 

The complement of { a n b n a n } is a CFL, but it cannot be accepted by a DPDA. 

{ a n b n a n } is context-sensitive but not context-free. (The grammar we just examined 

above that generates this language meets the conditions for context sensitivity.) 

L stands for a language that is recursive but not context-sensitive. We shall present one 

of these on p. 590. 

MATHISON is recursively enumerable but not recursive. 

ALAN comes from outerspace. 

Counting “outerspace,” we actually have seven classes of languages. The language of all 
computer program instructions is context-free; however, the language of all computer pro¬ 
grams themselves is r.e. English is probably context-sensitive except for poetry, which (as 
e.e. cummings proved in 1923) is from outerspace. 


4 s TYPE 0 = TM 

We shall now prove that r.e. — type 0. This was first demonstrated by Chomsky in 1959. The 
proof will be given in two parts. Theorem 75 and Theorem 76. 


THEOREM 75 

If L is generated by a type 0 grammar G, then there is a TM that accepts L. 


PROOF 

The proof will be by constructive algorithm. We shall describe how to build such a TM. This 
TM will be nondeterministic, and we shall have to appeal to Theorem 57 (p. 519) to demon¬ 
strate that there is therefore also some deterministic TM that accepts L. 

The Tape alphabet will be all the terminals and nonterminals of G and the symbol $ 
(which we presume is not used in G). When we begin processing, the Tape contains a string 
of terminals. It will be accepted if it is generated by G but will not be accepted otherwise. 

Step 1 We insert a $ in cell i, moving the input to the right, and insert another $ in the 
cell after the input string and an S after that. We leave the Tape Head pointing 
to the second $: 


1 

11 

hi 



i 

u 

in 

IV V VI 


111 

I 

II 

A 

[ ,'. . becomes 

$ 

I 

b 

b $ S 

0 

LI 


a 


Step 2 We now enter a great central state that will serve the same purpose as the cen¬ 
tral POP in the PDA simulation of a CFG in Chapter 15. The field of the Tape 
beginning with the second $ is where we will keep track of the working string. 
The basic strategy is to simulate the derivation of the input word in the working 
string field. 

We shall construct a branch from this central state that simulates the appli¬ 
cation of each production to a working string as follows. Consider any produc¬ 
tion 


W 3 • • * W 3 • • • 

where the x’s are any left side of a production in the grammar G and the y’s are 
the corresponding right side. Move the Tape Head nondeterministically up and 
down the working string until it stops at some cell containing x y We now scan 
the Tape to be sure that the immediate next subsequence is .... When 
we are confident that we have found this string, we roll the Tape Head back to 
point to Xj (which we have conveniently marked) and proceed with a sequence 
of deletes: 


DELETE 


DELETE 


DELETE 


just enough to delete the exact string of x’s. Then we insert the specified string 
of y’s by this sequence: 


INSERT y. 


INSERT y 2 


just as many as y’s on the right side. This accurately converts .the working 
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string into another working string that is derivable from it in the grammar G by 
application of this production. 

We add a loop like this for each production in the grammar G: 



Step 3 If we were lucky enough to apply just the right productions, at just the right 
points in the working string, and in just the right sequence to arrive at a string 
of all terminals, we nondeterministically branch to a subprogram that compares 
the working string to the input string. If they match exactly, then the TM halts. 
If the input was in fact derivable, then some choice of path through this NTM 
will lead to HALT. If not, then either we will come to a working string from 
which there are no applicable productions and crash, or else we loop forever, 
producing longer and longer working strings, none of which will ever be equal 
to the input. 

This NTM accepts any word in the language generated by G and only 
these words. ■ 

THEOREM 76 

If a language is r.e., it can be generated by a type 0 grammar. 

PROOF 

The proof will be by constructive algorithm. We must show how to create a type 0 grammar 
that generates exactly the same words as are accepted by a given TM. From now on, we fix 
in our minds a particular TM. 

Our general goal is to construct a set of productions that “simulate” the working of this 
TM. But here we run into a problem: Unlike the simulations of TMs by PMs or 2PDAs, a 
grammar does not start with an input and run it to halt. A grammar must start with S and end 
up with the word. To overcome this discrepancy, our grammar must first generate all possi¬ 
ble strings of a 's and /?’:s (not as final words but as working strings with nonterminals in 
them) and then test them by simulating the action of the TM upon them. 

As we know, a TM can badly mutilate an input string on its way to the HALT state, so 
our grammar must preserve a second copy of the input as a backup. We keep the backup 
copy intact while we act on the other as if it were running on the input Tape of our TM. If 
this TM ever gets to a HALT state, we erase what is left of the mutilated copy and are left 
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with the pristine copy as the word generated by the grammar. If the second copy does not 
run successfully on the TM (it crashes or loops forever), then we never get to the stage of 
erasing the working copy. Because the working copy contains nonterminals, this means that 
we never produce a string of all terminals. This will prevent us from ever successfully gener¬ 
ating a word not in the language accepted by the TM. A derivation that never ends corre¬ 
sponds to an input that loops forever. A derivation that gets stuck at a working string with 
nonterminals still in it corresponds to an input that crashes. A derivation that produces a real 
word corresponds to an input that runs successfully to HALT. 

That is a rough description of the method we shall follow. The hard part is this: Where 
can we put the two different copies of the string so that the productions can act on only one 
copy, never on the other? In a derivation in a grammar, there is only one working string gen¬ 
erated at any time. Even in phrase-structure grammars, any production can be applied to any 
part of the working string at any time. How do we keep the two copies separate? How do we 
keep the first copy intact (immune from distortion by production) while we work on the sec¬ 
ond copy? 

The surprising answer to this question is that we keep the copies separate by interlacing 
them. We store them in alternate locations on the working string. 

We also use parentheses as nonterminals to keep straight which letters are in which 
copy. All letters following a “(” are in the first (intact) copy. All symbols before a “)” are in 
the second (TM Tape simulation) copy. We say “symbol” here because we may find any 
character from the TM Tape sitting to the left of a “)”. 

When we are finally ready to derive the final word because the second TAPE-simulating 
copy has been accepted by the TM, we must erase not only the remnants of the second copy, 
but also the parentheses and any other nonterminals used as TM-simulation tools. 

First, let us outline the procedure in even more detail, then formalize it, and then finally 
illustrate it. 

Step 1 In our approach, a string such as abba will be represented initially by the work¬ 
ing string 

(i aa)(bb)(bb)(aa ) 

We need to be able to generate all such working strings. The following produc¬ 
tions will suffice: 

S^(aa)S | (bb)S | A 

Later we shall see that we actually need something slightly different because of 
other requirements of the processing. 

Remember that “(” and “)” are nonterminal characters in our type 0 gram¬ 
mar that must be erased at the final step. 

Remember too that the first letter in each parenthesized pair will stay im¬ 
mutable while we simulate the TM processing on the second letter of each pair 
as if the string of second letters were the contents of the TM Tape during the 
course of the simulation: 


First copy of input string to remain intact 



Second copy to be worked on as if it sits on TM Tape 

Step 2 Because a TM can use more Tape cells than just those that the input letters ini¬ 
tially take up, we need to add some blank cells to the working string. We must 
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give the TM enough Tape to do its processing job. We do know that a TM has a 
Tape with infinitely many cells available, but in the processing of any particular 
word it accepts, it employs only finitely many of those cells—a finite block of 
cells starting at cell i. If it tried to read infinitely many cells in one running, it 
would never finish and reach HALT. If the TM needs four extra cells of its Tape 
to accept the word abba , we add four units of (AA) to the end of the working 
string: 

Simulating input string Useless characters we will erase later 



(aa) ( bb) ( bb) (aa) (AA) (AA) (AA) (AA) 



Input and blank cells simulating TM Tape 


Notice that we have made the symbol A a nonterminal in the grammar we 
are constructing. 

Step 3 To simulate the action of a TM, we need to include in the working string an in¬ 
dication of which state we are in and where the Tape Head is reading. As with 
many of the TM simulations we have done before, we can handle both prob¬ 
lems with the same device. 

We shall do this as follows. Let the names of the states in the TM be q 0 
(the start state), q v q 2 , . . . .We insert a q in front of the parentheses of the 
symbol now being read by the Tape Head. To do this, we have to make all the 
q 's nonterminals in our grammar. 

Initially, the working string looks like this: 

q Q (aa)(bb)(bb)(aa)( AA)(AA)(AA)(AA) 

It may sometime later look like this: 

(aA)(bA)(bX)q 6 (aA)(Ab)(AM)(AA)(AA) 

This will mean that the Tape contents being simulated are AAXAbMAA and 
the Tape Head is reading the fourth cell, while the TM program is in state 
* 6 * 

To summarize, at every stage, the working string must: 

1. Remember the original input. 

2. Represent the Tape status, including Tape Head position. 

3. Reflect the state the TM is in. : 

Step 4 We also need to include as nonterminals in the grammar all the symbols thai 
the TM might wish to write on its Tape, the alphabet T. The use of these sym 
bols was illustrated above. . 

Step 5 Now in the process of simulating the operation of the TM, the working string 
could look like this: 

(aa)q 2 (bB)(bA)(aA)(AA)(AA)(AA)(AM) 

The original string we are interested in is abba, and it is still intact in th< 
positions just after “(”s. 

The current status of the simulated TM Tape can be read from the charac 
ters in front of the close parentheses. It is 
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The TM is in state q y and the Tape Head is reading cell ii as we can tell 
from the positioning of the q 3 in the working string. 

To continue the simulation, we need to be able to change the working 
string to reflect the specific instructions in the particular TM; that is, we need to 
be able to simulate all possible changes in Tape status that the specific TM pro¬ 
gram might produce. 

Let us take an example of one possible TM instruction and see what pro¬ 
ductions we must include in our grammar to simulate its operation. If the TM 
is. 



then our productions are from state q 4 while reading a b, print an A, go to state 
q 7 , and move the Tape Head left. 

We need a production that causes our representation of the prior status of 
the TM to change into a working string that represents the outcome status of 
the TM. We need a production like 

(SymboljSymboy^Symboljb) —» ^(SymboljSymbo^XSymboljA) 

where Symbol, and Symbol 3 are any letters in the input string (a or b) or the 
A’s in the extra (AA) factors, and Symbol 2 is what is in the Tape in the cell to 
the left of the b being read. Symbol 2 will be read next by the simulated Tape 
Head: 



{Symbol! Symbol 2 ) <74 (Symbol 3 6) - q 7 (Symbol} Symbol 2 )(SymboI 3 A) 



Part of input string to be left intact Part of input string to be left intact 

This is not just one production, but a whole family of possibilities covering 
all considerations of what Symbol,, Symbol 2 , and Symbol 3 are: 

(aa)q 4 (ab) -» q 7 (aa)(aA) 

(aa)q 4 (bb) —> q 7 (aa)(bA) 

(aa) q 4 {Ab)-^> q 7 (aa)(AA) 

(ab) q 4 (ab) -> q 7 (ab)(aA) 

(ab)q 4 (bb)->q 7 ( a bXbA) 

(bX)q 4 (Ab)-^ qi (bX)(AA) 

Notice that the way this simulation is set up there is no corresponding 
grammatical production for moving left from cell i because there would be no 
(Symbol, Symbol 2 ) in front of q { for such a move. 
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The simulation of a TM instruction that moves the Tape Head to the right 
can be handled the same way: 



“If in a state q g reading a B, write an X, move the Tape Head right, and go to 
state < 7 2 ” translates into the following family of productions: 

<? 8 (Symbol,fi) —* (Symbol,X> 7 2 

where Symbol, is part of the immutable first copy of the input string, or one of 
the extra A’s on the right end. Happily, the move-right simulations do not in¬ 
volve as many unknown symbols of the working string. 



We need to include productions in our grammar for all possible values for 
Symbol,. 

Let us be clear here that we do not include in our grammar productions for 
all possible TM instructions, only for those instructions that do label the edges 
in the specific TM we are trying to simulate. 

Step 6 Finally, let us suppose that after generating the doubled form of the word and sim¬ 
ulating the operation of the TM on its Tape, we eventually are led into a HALT 
state. This means that the input we started with is accepted by this TM. We then 
want to let the type 0 grammar finish the derivation of that word—in our example, 
the word abba —by letting it mop up all the garbage left in the working string. The 
garbage is of several kinds: There are A’s, the characters in F, the ^-symbol for the 
HALT state itself, and, let us not forget, the extra a 's and b 's that are lying around 
on what we think are TAPE-simulating locations, but which just as easily could be 
mistaken for parts of the final word, and then, of course, the parentheses. 

We also want to be very careful not to trigger this mop-up operation unless 
we have actually reached a HALT state. 

We cannot simply add the productions 

Unwanted symbols —> A 

because this would allow us to accept any input string at any time. Remember 
in a grammar (phrase-structure or other) we are at all times free to execute any 
production that can apply. To force the sequencing of productions, we must 
have some productions that introduce symbols that certain other productions 
need before they can be applied. What we need is something like 

If there is a HALT state in the working string, then unwanted symbols —* A. 

We can actually accomplish this conditional wipe-out in type 0 grammars 
in the following way: Suppose q n is a HALT state. We first add productions 
that allow us to put a copy of q n in front of each set of parentheses. This re¬ 
quires all possible productions of these two forms: 

(Symbol ,Symbol 2 )< 7 ,, —► q x , (Symbol, Symbol 2 )g,, 
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where Symbol, and Symbol 2 are any possible parenthesized pair. This allows 
q {[ to propagate to the left. We do this for HALT states and only HALT states. 

We also need 

q x , (Symbol, Symbol 2 ) —*q u (Symbol, Sy mbol 2 )#,, 
allowing q n to propagate to the right. 

This will let us spread the q { , to the front of each factor as soon as it makes 
its appearance in the working string. It is like a cold: Every factor catches it. In 
this example, we start with q n in front of only one parenthesized pair and let it 
spread until it sits in front of every parenthesized pair: 

(aA)(bB)q l , (bB)(aX)(AX)(AM) 

=> (aA)q n (bB)q u (bB)(aX)(AX)(AM) 

=> q u (aA)q n (bB)q n (bB)(aX)(^)(AM) 
^q n (aA)q n (bB)q n (bB)q u (aX)(bX)(XM) 
=>q n (aA)q n mq ll (bB)q n (aX)q l ,(AX)(A M) 
=>q n (aA)q ll (hB)q u (bB)q u (aX)q u (AX)q n (A.M) 

The <?’s that are not HALT states cannot be spread because we do not in¬ 
clude such productions in our grammar to spread them. 

Now we can include the garbage-removal productions 

q n (a Symbol,) — 
q n (b Symbol,) —►/? 
q n (A Symbol,)^ A 

for any choice of Symbol,. This will rid us of all the Tape simulation charac¬ 
ters, the extra A’s, and the parentheses, leaving only the first copy of the origi¬ 
nal input string we were testing. Only the immutable copy remains; the scaf¬ 
folding is completely removed. ■ 


ALGORITHM 

Here are the formal rules describing the grammar we have in mind. In general, the produc¬ 
tions for the desired type 0 grammar are the following, where we presume that S, X, Y are 
not letters in X or T: 


Prod 1 S~*q^X 
Prod 2 X^{aa)X 
Prod 3 X^(bb)X 
Prod 4 X-+Y 
Prod 5 r-»(AA)T 
Prod 6 Y—> A 

Prod 7 For all TM edges of the form 



q v (at ) (au)q w 
<l v (bt) -* ( bu)q w 
q v (\t) (A u)q w 


create the productions 
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Prod 8 For all TM edges of the form 



create the productions 

(Symbol, Symbol 2 )< 7 v (Symbol 3 f) —> ^(Symbol ^ymbo^XSymbo^w) 

where Symbol , and Symbol., can each be a , b , or A and Symbol 2 can be any character 
appearing on the TM Tape, that is, any character in T. 

Prod 9 If q x is a HALT state in the TM, create these productions: 

< 7 T (Syrnbol,Syrnbol 2 ) —* ^ t (Symbol 1 Symbol 2 )^ r 

(Symbol, Symbol 2 )^ r —* < 7 A (Symbol, Symbol 2 )^ 

q x (a Symbol 2 ) —* a 
q x (b Symbol 2 ) —* b 
q x { A Symbol 2 ) —*■ A 

where Symbol, — a , b, or A and Symbol 2 is any character in I\ 

These are all the productions we need or want in the grammar. ■ 

Notice that Prods 1 through 7 are the same for all TMs. Production sets 7, 8, and 9 de¬ 
pend on the particular TM being simulated. 

Now come the remarks that convince us that this is the right grammar (or at least one of 
them). Because we must start with S, we begin with Prod 1. We can then apply any se¬ 
quence of Prod 2’s and Prod 3’s so that, for any string such as baa , we can produce 

S 4> q Q (bb)(aa)(aa)X 

We can do this for any string whether it can be accepted by the TM or not. We have not 
yet formed a word, just a working string. If baa can be accepted by the TM, there is a certain 
amount of additional space it needs on the Tape to do so, say, two more cells. We can create 
this work space by using Prods 4, 5, and 6 as follows: 

=> q 0 (bb)(aa)(aa)Y 
=> q Q (bb)(aa)(aa)(AA)Y 
=> q 0 (bb)(aa)(aa)(AA)(AA)Y 
=>q Q (bb)(aa)(aa)( AA)(AA) 

Other than the minor variation of leaving the Y lying around until the end and eventually 
erasing it, this is exactly how all derivations from this grammar must begin. The other pro¬ 
ductions cannot be applied yet because their left sides include nonterminals that have not yet 
been incorporated into the working string. 

Now suppose that q 4 is the only HALT state in the TM. In order ever to remove the 
parentheses from the working string, we must eventually reach exactly this situation: 

^ q 4 mq 4 (aVq 4 (al)q 4 (Al)q 4 m 

where the five ?’s show some contents of the first five cells of the TM Tape at the time it 
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accepts the string baa. Notice that no rule of production can ever let us change the first 
entry inside a parenthesized pair. This is our intact copy of the input to our simulated 
TM. 

We could only arrive at a working string of this form if, while simulating the processing 
of the TM, we entered the HALT state q 4 at some stage: 

=^(6?)(a?to 4 (a?)( A?)(A?) 

When this happened, we then applied Prod 9 to spread the q 4 s. 

Once we have q 4 in front of every open parenthesis, we use Prod 9 again to reduce the 
whole working string to a string of all terminals: 

=^> baa 

All strings such as ba or abba . . . can be set up in the form 

q 0 (aa)(bb)(bb)(aa) . . . (AA)(AA) . . . (AA) 

but only those that can then be TM-processed to get to the HALT state can ever be reduced 
to a string of all terminals by Prod 9. 

Notice that we can use Prod 9 to put a HALT state q x behind the last parenthesis at the 
end of the working string. However, if we do, it will never be removed by Prod 9 rules, and 
so it is self-destructive to do so. 

In short, all words accepted by the TM can be generated by this grammar and all words 
generated by this grammar can be accepted by the TM. ■ 


EXAMPLE 

Let us consider a simple TM that accepts all words ending in a: 


(a,a,R ) 

(b,b,R) 



Note that the label on the edge from q Q to q { could just as well have been (A, A, L), but 
this works too. 

Any word accepted by this TM uses exactly one more cell of Tape than the space the in¬ 
put is written on. Therefore, we can begin with the productions 

Prod 1 S-^qJC 
Prod 2 X-*(aa)X 
Prod 3 X-+(bb)X 
Prod 4 X-*(AA) 

This is a minor variation, omitting the need for the nonterminal Y and Prods 4, 5, and 6. 

Now there are four labeled edges in the TM; three move the Tape Head right, one left. 
These cause the formation of the following productions. From 
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(a,a,R) 



we get 

Prod 7(i) q 0 (aa) —► (aa)q 0 
Prod 7(ii) q 0 {ba) —► (ba)q 0 
PROD 7(iii) q 0 (Aa) (A a)q Q 

From 


(. b.b.R ) 



we get 

Prod 7(iv) q 0 (ab) -► (a% 0 
Prod 7(v) <7 0 (M) “* (W>)<7 0 

Prod 7(vi) q 0 (Ab) -> (Ab)q 0 

From 



we get 

Prod 7(vii) q x {aa)—>(aa)q 2 
Prod 7(viii) q l (ba)^(ba)q 2 
Prod 7(ix) q { (Aa) —>(A a)q 2 

From 



we get 

Prod 8 (uv)q 0 (wA)^>q ] (uv)(wb) 

where u , v, and w can each be a , b, or A. (Because there are really 27 of these, let us pretend 
we have written them all out.) 

Because q 2 is the HALT state, we have 

Prod 9(i) q 2 (uv) —* q 2 (uv)q 2 where u, v = a, b, A 

Prod 9(ii) (uv)q 2 —* q 2 (uv)q 2 where u, v = a, b, A 

Prod 9(iii) q 2 (au)-+a where u = a,b, A 

Prod 9(iv) q 2 (bu)^b where u = a, b, A 

Prod 9(v) q 2 (Ait) —> A where u = a, b, A 

These are all the productions of the type 0 grammar suggested by the algorithm in the 
proof of Theorem 76 (p. 575). 

Let us examine the total derivation of the word baa: 



Notice that the first several steps are a setting-up operation and the last several steps are 
cleanup. 

In the setting-up stages, we could have set up any string of a's and b’ s. In this respect, 
grammars are nondeterministic. We can apply these productions in several ways. If we set up 
a word that the TM would not accept, then we could never complete its derivation because 
cleanup can occur only once the HALT state symbol has been inserted into the working 
string, as this can only be when the TM being simulated has reached HALT. Once we have 
actually begun the TM simulation, the productions are determined, reflecting the fact that 
TMs are deterministic. 

Once we have reached the cleanup stage, we again develop choices. We could follow 
something like the sequence shown. Although there are other successful ways of propagating 
the q 2 (first to the left, then to the right, then to the left again . . .), they all lead to the same 
completely saturated working string with a q 2 in front of everything. If they do not, the 
cleanup stage will not work and an all-terminal string will not be produced. ■ 

Now that we have the tool of type 0 grammars, we can approach some other results 
about recursively enumerable languages that were too difficult to handle in Chapter 23 when 
we could only use TMs for the proofs, or can we? 
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4 THE PRODUCT AND KLEENE CLOSURE 
OF r.e. LANGUAGES 

THEOREM 77 

If L x and L 2 are recursively enumerable languages, then so is L x L r The recursively enumer¬ 
able languages are closed under product. 

PROOF 

The proof will be by the same constructive algorithm we used to prove Theorem 37 (p. 380). 

Let L { and L 2 be generated by type 0 grammars. Add the subscript 1 to all the nontermi¬ 
nals in the grammar for L, (even the start symbol, which becomes 5,). Add the subscript 2 to 
all the nonterminals in the grammar for L r 

Form a new type 0 grammar that has all the productions from the grammars for L { and 
L 2 plus the new start symbol S and the new production 

S-+S& 

This grammar generates all the words in L,L 2 and only the words in L { L r The grammar 
is type 0, so the language L,L 2 is r.e. No? No. 

Surprisingly, this proof is bogus. Consider the type 0 grammar 

S—*a 

aS-+b 

The language L generated by this grammar is the single word a, but the grammar for the lan¬ 
guage LL that we have described in this alleged proof is 


which allows the derivation 


while, clearly, LL contains only the word aa. 

What goes wrong here is that in the proof for CFGs the possible substitutions repre¬ 
sented by the productions of the two languages could not interact because the right side of 
each production was a single nonterminal indexed by its grammar of origin. However, in this 
situation substrings could occur in the working string spanning the break between that which 
comes from 5, and that which comes from S r These substrings might conceivably be the left 
side of some production lying entirely with one of these languages, but a production that 
could not arise within 5, or S 2 alone. 

In order to prevent this, we use the following trick. We index even the terminals in each 
grammar with the subscript of its grammar. In this way, we turn the terminals into nontermi¬ 
nals for the purpose of keeping the left sides of the rules of production distinct. What we 
suggest is that a production in L x like 

abXSbS —> bXX 



becomes 
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a l b l X l S l b l S l ->b l X ] X l 
We also have to add the productions 

tfj— b x —->b a 2 —»a b 2 —*b 

so that we can finally reach a string of a’s and b’s as a final word in the product language. 

We do not have to worry that a derivation will de-subscript the a’s and b’s prematurely 
and recreate the problem that we had before, because no substring of the working string, 
spanning the break in languages, can be the left side of any production in S 2 because all such 
left sides have every factor subscripted with a 2. 

This then completes the proof of the theorem by constructive algorithm. ■ 

THEOREM 78 

If L is recursively enumerable, then L* is also. The r.e. languages are closed under Keene star. 

PROOF 

If we try to prove this theorem by a constructive algorithm similar to that for Theorem 38 
(p. 384) for CFGs, we would start with 

S^>SS | A 

and allow each S to produce an arbitrarily long sequence of S’ s, each turning into a word of 
L. However, we may encounter the same problem that we saw in the last theorem. Some of 
the S’s would produce strings of terminals that can conceivably attach themselves onto part 
of the derivation from the next S and make an otherwise unreachable production possible. 
The idea that we could index each copy of the productions from S with a separate index runs 
into a separate problem. Because the number of words from L that we wish to concatenate to 
form a word in L* is potentially unbounded, the number of copies of S we need to make ini¬ 
tially is also unbounded. This means that, because each S is to become a different nontermi¬ 
nal, the total number of nonterminals in the grammar is potentially unbounded. This violates 
the definition of a grammar—even a type 0 grammar. 

In order to keep the nonterminals in neighboring syllables from interacting, all we need 
is two copies of the grammar for L, one indexed with l’s (even the a’s and b’s) and one in¬ 
dexed with 2’s. We must then be sure that from the initial S we derive only alternating types 
of S’s. The following productions will do the trick: 

S^S { S 2 S | 5, ( A 

From this S we can produce only the strings A, S t , S X S V S { S 2 S V 5 , ,5 2 5 ] 5 2 , .... Again, we 
can have no cross-pollination of the derivations from neighboring S’ s. This and the indexing 
of the entire grammar for L and the productions de-subscripting the terminals constitute the 
complete grammar for L*. ■ 

EXAMPLE 

If L is the language generated by the type 0 grammar 

5 —* a aS —*b 


then L* is generated by the grammar 
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S-+S&S I S 1 I A 



a 2 S 2 —► b 2 

a\-+a b x —*b a 2 -*a b 2 ~*b ■ 

CONTEXT-SENSITIVE GRAMMARS 
DEFINITION 

A generative grammar in which the left side of each production is not longer than the right 
side is called a context-sensitive grammar, denoted CSG, or type 1 by the table on p. 573.■ 

Context-sensitive languages are actually what we presume to be the model for all human 
languages, but because we do not have a mathematical definition for the class of “human 
languages,” we cannot expect to have a mathematical proof of this fact. One thing that we do 
know about context-sensitive languages is that they are recursive. 

THEOREM 79 

For every context-sensitive grammar G, there is some special TM that accepts all the words 
generated by G and crashes for all other inputs. 

PROOF 

Let us assume the input string we are going to test is w, and we shall describe how T works on w. 

All the rules of production in a type 1 grammar do not shorten the working string. They 
may lengthen it or leave it the same length. So, the derivation for w is a sequence of working 
strings, each as long as or longer than the one before it. 

In the shortest derivation for w, there is no looping, by which we mean that each work¬ 
ing string is different. It may be possible in the grammar G to replace XY with ZW and then 
ZW with XY to get the same working string a second time, but it cannot be necessary to do 
so, and it cannot be part of the shortest derivation. 

A derivation is a path in the total language tree of G, which is just like the total lan¬ 
guage trees for CFGs. We start at 5 and derive a second row by applying all the productions 
applicable to produce new nodes of the tree. We can then reiterate the procedure and apply 
all productions possible to each existing node in a given row to produce the next row of the 
tree. Every time we produce a new node, we check to be sure that it is different from all the 
other previously derived nodes. 

Our particular TM will not generate the entire language derivable from G. It will termi¬ 
nate any branch of the tree whose end node exceeds w in length. This will then be a finite 
tree because there are only finitely many strings of characters from G of length w or less. 
Therefore, in a finite number of steps, it will either find a derivation for w, determine that 
there is none, or crash. 

Can a TM do all this? Of course. We start with w and insert markers around it. Then we 
write S. Next we put a row marker to indicate that we are starting a new row of the tree. Sub¬ 
sequently we enter a state that scans all the nodes on the previous row to see which have sub¬ 
strings that are left sides of some rule of production in G. This TM is a specialized machine 
and has all the information about the productions in G programmed into it, so this scanning 
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procedure is part of the TM program. The machine then copies the old node and makes the 
substitution (using the appropriate sequence of DELETES and INSERTS) and then checks to 
see if the new node it just made is worth keeping. This means that the string is not a dupli¬ 
cate of another node and not longer than w. Then we check to see whether the new node is w. 
If it is, we go to HALT. If it is not, we put a node marker on the Tape and return to the next 
node of the previous row not yet fully exploited (having left an indication of where we al¬ 
ready have been). Once we have explored all the nodes on the previous row, we have finished 
creating the new row of the tree, and we place a row marker on the Tape and reiterate. 

This TM will terminate if it does generate h\ or if it finds that while operating on a cer¬ 
tain row, it was able to contribute no new nodes to the next row. This is recognized by seeing 
whether it prints two consecutive row markers. If it does this, it crashes. By the discussion 
above, it must eventually do one of these two things. Therefore, this TM proves the language 
of G is recursive. ■ 

Why does this construction work for all type 1 grammars and yet not carry over to show 
that all type 0 grammars are also recursive? The answer is that because type 0 grammars can 
have productions that decrease the length of the working string, we cannot use the simple 
length analysis to be sure that w does not lie somewhere farther down any particular branch 
of the tree. No branches can be terminated and the tree may grow indefinitely. 

Knowing that a language is recursive translates into being able to decide membership for it. 

THEOREM 80 

Given G, a context-sensitive grammar, and w, an input string, it is decidable by a TM 
whether G generates w. 

PROOF 

We have not been very specific about how one inputs a grammar into a TM, but we can imagine 
some string of delimiters separating the productions, possibly allowing the production arrow to 
be a Tape character as well. What the TM we have in mind does is create the CWL code word 
for the TM based on G described in the previous theorem. Then it feeds both the coded TM and 
w into the universal TM. Because w either halts or crashes on the coded TM, this procedure will, 
indeed, lead to a decision about w ’s membership in the language generated by G. ■ 

THEOREM 81 

There is at least one language L that is recursive but not context-sensitive. 

PROOF 

This we shall prove by constructing one. 

In the previous theorem, we indicated that there was some method for encoding an entire 
context-sensitive grammar into a single string of symbols. Listing the productions in any or¬ 
der with their arrows and some special symbol as a separator is fine, because then a TM can 
decide whether, given an input string, it is the code word for some CSG. It would have to see 
that between any two separators there was one and only one arrow and that the string on the 
right of the arrow was not shorter than the string on the left. It would also have to ensure that 
the left side of each production has some nonterminals. All these are elementary TM tasks. 

Let us define the language L (we ran out of Turing’s names) as follows: 
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L — {all the code words for context-sensitive grammars that cannot be 
generated by the very grammars they encode} 

Observation 

L is recursive. We can feed any string over the code word alphabet first into the TM that 
checks to be sure it represents a CSG and then into the membership testing machine both as 
gram mar and input. This will definitely decide whether the input is a code word for a gram¬ 
mar that accepts it; only it returns the exact opposite answer to the one we want for L. We 
can either modify the machine to reverse HALT and crash (as we have done before) or use 
this TM the way it is now to show that the complement of L is recursive, and conclude that L 
is recursive that way. 

Observation 

L is not a context-sensitive language. If it were, then all its words would be generated by 
some CSG G. Let us consider the code word for G. If this code word is in L, then (as with 
words in L) it cannot be generated by the grammar it represents. But that would mean that 
some word in L cannot be generated by G, which is a contradiction. On the other hand, if the 
code word for G is not in L, that means the code word for G cannot be generated by the 
grammar it represents, and as such, by the definition of L, must be in L. Another contradic¬ 
tion. The solution is that there is no such grammar G. 

Taking the two observations together proves L is our counter-example. ■ 

PROBLEMS 

For problems 1, 2, and 3 consider the grammar 

Prod 1 S~*ABS | A 
Prod 2 AB~*BA 
Prod 3 BA^AB 
Prod 4 A— 

Prod 5 B~*b 

1. Derive the following words from this grammar: 

(i) abba 

(ii) babaabbbaa 

2. Prove that every word generated by this grammar has an equal number of a' s and b's. 

3. Prove that all words with an equal number of a's and b's can be generated by this gram¬ 
mar. 

4. (i) Find a grammar that generates all words with more a's than b's, MOREA p. 205. 

(ii) Find a grammar that generates all the words not in EQUAL. 

(iii) Is EQUAL recursive? 

For Problems 5 through 7, consider the following grammar over the alphabet X = {a b c }: 


Prod 

1 

s - 

+ABCS 

Prod 

2 

AB- 

-*BA 

Prod 

3 

BC- 

-*CB 

Prod 

4 

AC- 

+ CA 

Prod 

5 

BA- 

-*AB 
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Prod 6 CB^>BC 

Prod 7 CA->AC 

Prod 8 A —*a 

Prod 9 B —>b 

Prod 10 C -*c 

5. Derive the following words: 

(i) ababcc 

(ii) cbaabccba 

6. Prove that all words generated by this grammar have equal numbers of a's, b's, and c' s. 

7. Prove that all words with an equal number of a's, b's, and c’s can be generated by this 

grammar, the language VERYEQUAL, p. 375. 

Problems 8 through 10 consider the following type 0 grammar over the alphabet 
%={a b }: 

Prod 1 S 
Prod 2 UV 
Prod 3 UV 
Prod 4 YX 
Prod 5 ZX 
Prod 6 Ya 
Prod 7 Yb 
Prod 8 Za 
Prod 9 Zb 
Prod 10 UV 
Prod 11 X 
Prod 12 aV 
Prod 13 bV 

8. Derive the following words from this grammar: 

(i) A 

(ii) aa 

(iii) bb 

(iv) abab 

9. Show that if w is any string of a's and b's, then the word 

ww 

can be generated by this grammar. 

10. Suppose that in a certain generation from S we arrive at the working string 

wUVwX 

where w is some string of a's and b's. 

(i) Show that if we now apply Prod 10, we will end up with the word ww. 

(ii) Show that if instead we apply Prod 11, first we cannot derive any other words. 

(iii) Show that if instead we apply Prod 2, we must derive the working string 

waUVwaX 

(iv) Show that if instead we apply Prod 3, we must derive the working string 



wbUVwbX 
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(v) Use the fact that UVX is of the form wUVwX with w = A to prove that all words 
generated by this grammar are in the language DOUBLEWORD (p. 200). 

11. Consider the following type 0 grammar over the alphabet X = {a}. Note: There is no b. 


Prod 1 

S 

—*a 

Prod 2 

S 

—*CD 

Prod 3 

C 

->ACB 

Prod 4 

C 

->AB 

Prod 5 

AB 

—*aBA 

Prod 6 

Aa 

—■*aA 

Prod 7 

Ba 

aB 

Prod 8 

AD 

—► Da 

Prod 9 

BD 

—> Ea 

Prod 10 

BE 

Ea 

Prod 11 

E 

—*a 


Draw the total language tree of this language to find all words of five or fewer let¬ 
ters generated by this grammar. 

Generate the word a 9 — aaaaaaaaa. 

Show that for any n — 1, 2, . . . , we can derive the working string 

A n B n D 

(iv) From A n B n D, show that we can derive the working string 

a n2 B n A n D 

(v) Show that the working string in part (iv) generates the word 

«(" + » 2 

(vi) Show that the language of this grammar is 

SQUARE = { a nl where w = 1 2 3 . . . } 

= {a aaaa a 9 u 16 . . .} 

12. What language is generated by the grammar 

Prod 1 S 
Prod 2 XY 
Prod 3 Zb 
Prod 4 Za 

Prove any claim. 

13. Analyze the following type 0 grammar: 

Prod 1 
Prod 2 
Prod 3 
Prod 4 
Prod 5 
Prod 6 

(i) What are the four smallest words produced by this grammar? 

(ii) What is the language of this grammar? 


S —*A 
A -*>aABC 
A —> abC 
CB^BC 
bB bb 
bC^b 


—* aXYba 
XYbZ | A 
—>bZ 
—* aa 


(i) 

(ii) 

(iii) 



14. Show that the class of context-sensitive language is closed under union. 

15. Show that the class of context-sensitive languages is closed under product. 

16. Show that the class of context-sensitive languages is closed under intersection. 

17. Show that the class of context-sensitive languages is closed under Kleene closure. 

18. Show that if L is a CSL, then so is transposed). 

19. A context-sensitive language is said to be in Kuroda normal form (after S. Y. Kuroda) if 
every production is of one of the following four forms: 

A—* a 
A B 
A-+BC 
AB^CD 

(i) Show that for every CSL there is a CSG in Kuroda normal form that generates it. 

(ii) Can this KNF be useful as a tool in parsing, that is, in deciding membership? 

20. In the proof that every type 1 grammar can be accepted by some TM, we simulated 
the productions of the grammar by a series of DELETES followed by a series of 
INSERTS. 

(i) Show that if the grammar being simulated were context-sensitive, the working 
string simulation field would never be larger than the input itself. 

(ii) Show that this means that the total length of the section of the TM Tape being used 
in the simulation reaches a maximum of In + 2 cells, where n is the length of the 
input string. This is a simple linear function of the size of the input. This is what is 
meant by the terminology “linear bounded automaton.” 
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CHAPTER 25 
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Computers 




DEFINING THE COMPUTER 


The finite automata, as defined in Chapter 5, are only language-acceptors. When we gave them 
output capabilities, as with Mealy and Moore machines in Chapter 8, we called them trans¬ 
ducers. The pushdown automata of Chapter 14 similarly do not produce output and are only 
language-acceptors. However, we recognized their potential as transducers for doing parsing 
in Chapter 18, by considering what is put into, left in, or popped from the STACK as output. 

TMs present a completely different situation. They always have a natural output. When 
the processing of any given TM terminates, whatever is left on its Tape can be considered to 
be the intended, meaningful output. Sometimes, the Tape is only a scratch pad where the 
machine has performed some calculations needed to determine whether the input string 
should be accepted. In this case, what is left on the Tape is meaningless. For example, one 
TM that accepts the language EVENPALINDROME works by cancelling a letter each from 
the front and the back of the input string until there is nothing left. When the machine 
reaches HALT, the Tape is empty. 

However, we may use TMs for a different purpose. We may start by loading the Tape 
with some data that we want to process. Then we run the machine until it reaches the HALT 
state. At that time, the contents of the Tape will have been converted into the desired output, 
which we can interpret as the result of a calculation, the answer to a question, a manipulated 
file—whatever. 

So far, we have been considering only TMs that receive input from the language defined by 
(a + b)*. To be a useful calculator for mathematics, we must encode sets of numbers as words 
in this language. We begin with the encoding of the natural numbers as strings of a’s alone: 

The code for 0 = A 
The code for 1 = a 
The code for 2 — aa 
The code for 3 = aaa 


- 




' 

mu 


uses two digits, or decimal with ten). 

Every word in (a -I- b)* can then be interpreted as a sequence of numbe 
separated internally by b’ s. For example, the decoding of ( abaa ) is 1, 2 and 


which 

4 ; 

of a’s) 

0 - 


.-as 


bbabbaa — (no a's)b(no a’s)fr(one a)b( no a’s)&(two a’s) 
represents 0, 0, 1, 0, 2. 

Notice that we are assuming that there is a group of a’ s at the beginning of the string 
and at the end even though these may be groups of no a' s. For example, 

abaabb = (one a)b( two a’s)b(no a’s)b(no a’s) 
which represents 1, 2, 0, 0. 

When we interpret strings of a’s and b’s in this way, a TM that starts with an input string 
of a’s and b’s on its Tape and leaves an output string of a’s and b’s on its Tape can be con¬ 
sidered to take in a sequence of specific input numbers and, after performing certain calcula¬ 
tions, leaves as a final result another sequence of numbers—output numbers. 

We are considering here only TMs that leave a’s and b’s on their Tapes; no special sym¬ 
bols or extraneous spaces are allowed among the letters, unless they too are given special 
output meanings. 

We have already seen TMs that fit this description that had no idea they were actually 
performing data processing, because the interpretation of strings of letters as strings of num¬ 
bers never occurred to them. “Calculation” is one of those words that we never really had a 
good definition for. Perhaps we are at last in a position to correct this. 

EXAMPLE 

Consider the following TM called ADDER: 

C a,a,R) (a,a,It) 

f ( b,a,R ) f\* (XXL) (a.A ,K) f N 

f START Y -—- *4 1 y - 3*4 2 J- > f HALT J 

In START, we skip over some initial dump of a’s, leaving them unchanged. When we read 
a b, we change it to an a and move to state 1. In state 1, a second b would make us crash. We 
skip over a second clump of a’s until we run out of input string and find a A. At this point, we go 
to state 2, but we move the Tape Head left. We have now backed up into the a’s. There must be 
at least one a here because we changed a b into an a to get to state 1. Therefore, when we first 
arrive at state 2, we erase an a and move the Tape Head right to HALT and terminate execution. 

For an input string to be accepted (lead to HALT), it has to be of the form a*ba*. If we 
start with the input string a n ba m , we end up with a n+m on the Tape. 

When we decode strings as sequences of numbers as above, we identify a n ba m with the 
two numbers n and m. The output of the TM is decoded as (n + m). 

Under this interpretation, ADDER takes two numbers as input and leaves their sum on 
the Tape as output. 

This is our most primitive example of a TM intentionally working as a calculator. ■ 

If we used an input string not in the form a*ba*, the machine would crash. This is anal¬ 
ogous to our computer programs crashing if the input data are not in the correct format. 

Our choice of unary notation is not essential; we could build an “adding machine” for 
any other base as well. 

EXAMPLE 

Let us build a TM that adds two numbers presented in binary notation and leaves the answer 
on the Tape in binary notation. 
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We shall construct this TM out of two parts. First, we consider the TM T { shown below: 


(0,0, R) 

(1,1, R) (1.0, L) 



This TM presumes that the input is of the form 

$(0 + 1 )* 

It finds the last bit of the binary number and reverses it; that is, 0 becomes 1, 1 becomes 
0. If the last bit was a 1, it backs up to the left and changes the whole clump of 1 ’s to 0’s, 
and the first 0 to the left of these l’s turns into a 1. All in all, this TM adds 1 to the binary 
number after the $. If the input was of the form $1*, the machine finds no 0 and crashes. 
In general, T { increments by 1. 

Now let us consider the TM T r This machine will accept a nonzero number in 
binary and subtract 1 from it. The input is presumed to be of the form $(0 + 1)*$ but not 
$0*$. 

The subtraction will be done in a three-step process: 

Step 1 Reverse the 0’s and 1 ’s between the $’s. This is called taking the 1 ’s comple¬ 
ment. 

Step 2 Use 7j to add 1 to the number now between the $’s. Notice that if the original 
number was not 0, the l’s complement is not a forbidden input to T y (i.e., not 
all l’s). 

Step 3 Reverse the 0’s and 1 ’s again. - 

The total result is that what was a will become x — 1. 

The mathematical justification for this is that the 1 ’s complement of x (if it is n bits 
long) is the binary representation of the number 

(2" — 1) — a: 

Because when x is added to it, it becomes n solid l’s = 2" — 1, 
x becomes (2' ! -1) - x (Step 1) 

Which becomes (2" -1) - x + 1 = (2 n -1) - (x - 1), the 1 ’s 
complement of x — 1 (Step 2) 

Which becomes (2" -1) - [( 2 n -1) - (x - 1)] = (x - 1) (Step 3) 

For example, 

$1010$ = binary for 10 

Becomes $0101$ = binary for 5 
Becomes $0110$ = binary for 6 
Becomes $1001$ = binary for 9 
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T 2 is next shown. 

( 0 , 1 , R) ( 0 , 0 , 1 ) 


U,J?) 

,0 ,R) 


We generally say T 2 decrements by 1. 

The binary adder we shall now build works as follows: The input strings will be of the 
form 

$(0 + i )*$(0 +1)* 

which we call 

$ x-part $ y-part 

We shall interpret the x-part and y-part as numbers in binary that are to be added. Fur¬ 
thermore, we make the assumption that the total x + y has no more bits than y itself. This is 
analogous to the addition of numbers in the arithmetic registers of a computer where we pre¬ 
sume that there will be no overflow. 

If y is the larger number and starts with the bit 0, the condition is guaranteed. If not, we 
can INSERT 0 in front of y. 

The algorithm to calculate x + y in binary will be this: 

Step 1 Check the x-part to see whether it is 0. If yes, halt. If no, proceed. 

Step 2 Subtract 1 from the x-part using T 2 above. 

Step 3 Add 1 to the y-part using T x above. 

Step 4 Go to step 1. 

The final result will be 

$ 0 *$ (x + y in binary) 

Let us roughly illustrate the algorithm using analogous decimal numbers: 

$4$7 

Becomes $3$8 

Becomes $2$9 

Becomes $1$10 

Becomes $0$11 



The full TM is 
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2 


(x^O) 

-* 

$Q1$0111 

4 


5 

$10$0111 

— > 

$11$0111 
7 

(x«-x-l) 

-* 

$00$0111 

8 


8 

$00$011I 

—> 

$00$0110 

10 


10 

$00$1000 

—> 

$00$1000 
HALT 

(31 = 0) 

-*■ 

$00$1000 


2 

3 


$oi$om -> 

$01$0111 

-> 

5 

6 


$11$0111 -> 

$11$0111 


7 

7 


$00$0j_ll — 

$oo$oni 

-> 

8 

8 


$00$0100 -» 

$00$0000 


10 

1 


$00$1000 -» 

$00$1000 



$U$0111 


$10$0111 

6 


6 

$oi$oiii 

-> 

$00$0111 

7 


7 

$oo$om 

—> 

$00$0111A 

9 



$00$1000 


(y—y + i) 

1 


l 

$00$1000 

— * 

$00$1000 


The correct binary total is 1000, which is on the Tape when the TM halts. ■ 


DEFINITION 

If a TM has the property that for every word it accepts, at the time it halts, it leaves one solid 
string of tf’s and b ’s on its Tape starting in cell i, we call it a computer. The input string we 
call the input (or string of input numbers), and we identify it as a sequence of nonnegative 
integers. The string left on the Tape we call the output and identify it also as a sequence of 
nonnegative integers. ■ 

In the definition above, we use the semiambiguous word “identify” because we do not 
wish to restrict ourselves to unary encoding or binary encoding or any other particular sys¬ 
tem. 


•ft- COMPUTABLE FUNCTIONS 

Now we finally know what a computer is. Those expensive boxes of electronics sold as com¬ 
puters are only approximations to the real McCoy. For one thing, they almost never come 
with an infinite memory like a true TM. At this stage in our consideration, we are dealing 
only with zero and the positive integers. Negative numbers and numbers with decimal points 
can be encoded into nonnegative integers for TMs as they are for electronic digital comput¬ 
ers. We do not worry about this generality here. Let us define the new symbol ” to use in¬ 
stead of the regular minus sign. 

DEFINITION 

If m and n are nonnegative integers, then their simple subtraction is defined as 

{ m — n if m 3* n 

0 if m < n 

Essentially what — does is perform regular subtraction and then rounds all negative answers 
back up to 0. ■ 


Simple subtraction is often called proper subtraction or even monus. 
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EXAMPLE 

Consider the TM below called MINUS: 



{a,A,R) 


This machine works as follows. To get from START to state 3, the input on the Tape 
must have been of the form a + ba*, or else the machine would crash. This can be interpreted 
as starting with two numbers, the first of which is not 0. 

Along the way to state 3, we have changed the first a into A —the usual expedient to 
guarantee that we do not accidentally move left from cell i while backing up. 

Notice that the Tape Head is reading the last nonblank character when we enter state 3. If 
what is being read in state 3 is a b, it signifies that our task (which we have not yet explained) is 
done. We erase the b and move to state 4. This state leaves all a’s and A’s as it finds them and 
seeks the A in cell i. When this is found, it is changed back into an a and the process halts. 

If the character read in state 3 is an a, a different path is followed. The a is erased while 
moving to state 5. Here, we move left, seeking the center b. When we find it, we reach state 
6 and continue left in search of the last a of the initial group of a’s. We find this, erase it, and 
move to state 7. State 7 moves right, seeking the center b. We cross this going to state 8 
where we seek the last a of the second group of a’s. When this is located, we return to state 
3. The circuit 

state 3 —state 5—state 6-state 7-state 8—state 3 
cancels the last a of the second group against the last a of the first group. 
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For example, what starts as Aaaaabaa becomes AaaaAbaA, which then becomes 
AaaAAbAA. Now from state 3, we follow the path state 3-state 4-HALT, leaving aaa on the 
Tape alone. This is the correct result of the subtraction 5 — 2. 

The only possible deviation from this routine is to find that the a that is to be cancelled 
from the first group is the A in cell i. This could happen if the two groups of a’s are initially 
the same size, or if the second group is larger: 

aabaa —*■ Aabaa _■* Aaba A —* A Aba —* AAbA —> AAbA —* A . . . 

or 

aabaaa —► Aabaaa Aabaa A — * AAbaa AAbaA —AAbaA —* A . . . 

If this happens, states 9 and 10 erase everything on the Tape and leave the answer zero (an 
all-blank Tape). It is not recorded whether this zero is the exact answer or a rounded-up answer. 

If we start with a m ba n on the Tape, we will be left with a m ~ n unless m ^ n, in which case 
we will be left with only blanks. 

This machine then performs the operation of simple subtraction as defined by the sym¬ 
bol ■ 

Notice that although this TM starts with a string in (a + b)* and ends with a string in 
(a + b)*, it does use some other symbols in its processing (in this case, A). 

DEFINITION 

If a TM takes a sequence of numbers as input and leaves only one number as output, we say 
that the computer has acted like a mathematical function. Any operation that is defined on 
all sequences of K numbers (for some number K 5 s 1) and that can be performed by a TM is 
called T\iring-computable or just computable. ■ 

The TMs in the last two examples, ADDER and MINUS, provide a proof of the follow¬ 
ing theorem. 

THEOREM 82 

Addition and simple subtraction are computable. ■ 

In both of these examples, K = 2 (addition and subtraction are both defined on a se¬ 
quence of two numbers). Both of these are functions (they leave a one-number answer). 

THEOREM 83 

The function MAX (. x , y), which is equal to the larger of the two nonnegative integers a and 
y , is computable. 

PROOF 

We shall prove this by describing a TM that does the job of MAX. 

Let us use the old trick of building on previous results, in this case the machine MINUS. 
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MINUS does make the decision as to which of the two numbers m or n is larger. If m is 
larger, m — n leaves an a in cell i. If n is larger than (or equal to) m, cell i will contain a A, 
However, after the program is completed, it is too late to leave m or n on the Tape, because 
all that remains ism — n. 

Instead of erasing the a’ s from the two groups as we do in MINUS, let us make this 
modification. In the first section, let us turn the a’ s that we want to erase into jc’s and let us 
turn the a’s of the second section that we want to erase into y* s. For example, what starts as 
aaaaabaa and on MINUS ends as aaa now should end as Aaaxxbyy. 

Notice that we have left the middle b instead of erasing it, and we leave the contents of 
cell i A if it should have been a or, as we shall see, leave it a (if it should have been A). 

The TM program that performs this algorithm is only a slight modification of MINUS. 


(i a,a,R) 



If we arrive at state 4, the first input group of a’s is larger. The Tape looks like this: 

Aa . . . aaxx . . . xxbyy . . . yy 

with the Tape Head reading the y to the right of the b. To finish the job of MAX, we must go 
right to the first A, then sweep down leftward, erasing all the y’s and the b as we go and 
changing x’s into a’s, and finally stopping after changing A into a: 

(y,M) 

(6, AD 
C x,a,L) 

(■ a,a,L) (y,y,R) 

/ -N iA,a,R) 

f HALT -—- 



If we arrive at state 9, the second group is larger than or equal to the first. Then Tape 
now looks like this: 

axx . . . xxbaa . . . aayy . . . yy 

with the Tape Head reading cell ii. Here, what we have to do is leave a number of a’s equal 



to the former constitution of the second group of a’s (the current a’s and y’s together). Now 
since there are as many symbols before the b as y’s, all we really need to do is erase the b 
and the y’s, change the jc’s to a’s, and shift the other a’s one cell to the left (into the hole left 
by b). For example, axxxbaayyyy becomes aaaaAaaAAAA and then aaaaaa. 

This TM program does all this: 

(x,ct,R) 

(b,a,R) 

(a,a,R) 

(y,AR ) (AA L) 



What we actually did was change the b into an a instead of A. That gives us one too 
many a’s, so in state 11 we back up and erase one. 

This machine is one of many TMs that does the job of MAX. ■ 


EXAMPLE 


Let us trace the execution of the input aaabaa on this TM: 



START 


1 

1 


1 


aaabaa 


Aaabaa —► 

Aaqbaa 

—>• 

Aaabaa 


3 


5 

5 


6 

-> 

Aaabaa 

-> 

Aaabgy —> 

Aaakay 

-* 

Aagbay 


3 


5 

6 


6 

-*■ 

Aaxbgy 

-> 

Aaxbyy —■* 

Aaxbyy 


Aaxbyy 


3 


4 

4 


4 

-> 

Axxbyy 


Axxbyy —■► 

Axxbyy 

-> 

Axxbyy A 


10 


10 

10 


HALT 

-► 

Axx AAA 


AraAAA —> 

AaaAAA 


aaaAAA 


This is the correct answer because 


2 

2 

2 

Aaabaa 

Aaabaa —* 

Aaabaa A 

1 

8 

8 

Aaxbay —* 

Aaxbay 

Aaxbay 

7 

7 

8 

Axxbyy —► 

Axxbyy —* 

Axxbyy 

10 

10 

10 

Axxbyy —> 

AxxbyA —> 

AxxhAA 


MAX(3, 2) = 


EXAMPLE 


To give equal time to the state 9-state 11-HALT branch, we trace the execution of the input 
aabaaa : 


START 

1 

1 

aabaaa —> 

Aabaaa 

—* Aabaaa 

3 

5 

5 

Aabaaa —•* 

Aabaay 

—> Aabaay 

8 

8 

3 

Axbaay —> 

Axbaay 

—> Axbaay 

9 

9 

9 

aybayy —> 

aabayy 

—> aaaayy 

11 

11 

HALT 

aaa A —> 

aaaa 

—> aaaA 


2 

2 

Aabaaa 

—*■ Aabaaa 

5 

6 

Aabaay 

—> Aabaay 

5 

5 

Axbgyy 

—> Axbayy 

9 

9 

aaaayy 

—■► aaaa Ay 


2 2 
Aabaaa —> Aabaaa A 
7 8 

Axbaay —* Axbgxiy 
6 6 
Axbayy —> Axbayy 
9 11 

aaaaAAA —* aaaaAA 
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THEOREM 84 

The IDENTITY function 

IDENTITY (n) = n for all n 3* 0 

and the SUCCESSOR function 

SUCCESSOR^?) = n + 1 for all n ^ 0 

are computable. 

Note: These functions are defined on only one number (K = 1), so we expect input only 
of the form a*. 


PROOF 

The only trick in the IDENTITY function is to crash on all input strings in bad format, that 
is, not of the form a*: 

(< a,a,R) 



(A,A,R) 


Similarly, SUCCESSOR is no problem: 


(a, a, R) 



(A, a, R) 


DEFINITION 

The *th of n selector function is the function that starts with a sequence of n nonnegative 
numbers and erases most of them, leaving only the ith one (whether that one is the largest or 
not). It is written 

SELECT///«(,,,) 

where there is space for exactly n numbers inside the parentheses. For example, 

SELECT/2/4(8, 7,1, 5) = 7 

SELECT/4/9(2, 0, 4, 1, 5, 9, 2, 2, 3) = 1 ■ 

THEOREM 85 

The ith of n selector function is computable for every value of i and n (where we assume i is 
less than or equal to AO- 
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PROOF 

We shall build a TM that shows that the “third of five” selector function is computable. The 
other SELECT fUn functions can be constructed similarly. 

The TM that operates as 

SELECT/3/5(r, s, t, u, v) 

begins with input of the form 

a r ba s ba'ba u ba v 

It marks the first cell with a *; erases the first clump of a 's and the first b, the next a’s, and 
the next b ; saves the next a' s; and erases the next b , the next a’s, the next b, and the last a' s, 
all the time moving the Tape Head to the right. 


(!>.*.N) 



aaababaabaaaaba 

becomes 

*AAAAAAaaAAAAAAA 

We now choose to shift the remaining a’s down to the left to begin in cell i, which we 
marked with a *. We can use the TM subroutine DELETE A. We keep deleting the A in cell i 
until the contents of cell i becomes an a. Then we stop. ■ 


THEOREM 86 

Multiplication is computable. 


PROOF 

The proof will be by constructive algorithm. This machine, called MPY, takes strings of the 
form a m ba n and leaves on the Tape a mn . To make things easier on ourselves, we shall build a 
machine that rejects the input if n or m is zero; however, if we wanted to, we could build the 
machine differently to allow multiplication by zero (see the Problems section). 

The algorithm this machine will follow is to insert a b in the first cell and place the sym¬ 
bol # after the entire input string. Then to the right of the #, it will write one copy of the 
string a n for each a in the string a m , one by one erasing the a’s in the first string. For exam¬ 
ple, the multiplication of 3 times 2 proceeds in these stages: 

baaabaa # 
bAaabaattaa 
bAAabaa#aaaa 
bAAAbaaftaaaaaa 


1 
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The machine will then erase everything between and including the second b and the #. The 
Tape now looks like this: 


frAAAAAAAaaaaaa 


For this machine, we shall spell out a simplified version of DELETE to shift the string 
of car’s leftward to begin in cell ii. We do this because we want to make a complete trace of 
the runnings of the full TM. 

MPY begins like this: 

( a,a,R ) 


So far, we have checked the form of the input (so we can crash on improper inputs) and 
placed the initial b and the # where we want them. 

Now we go back and find the first a in a m and convert it into a A: 


Now we find the beginning of the second factor a 


Now one by one, we turn these car’s in the second factor into A’s and copy them on the 
other side of the #: 


bAaaabaa # 

bAaaabAatta 

bAaaabAA#aa 






( a,a,R) 

rn 

1 b 1AK1 r.s*" 1 

I INIbcK 1 o 



1 1 




i (MX) 

<fcM5 i 


, .(Oj 

1 o 

i —-AU 

i 

o 




t (a,A,R) 

f (A ,a,L) 

l in 1 i 

(uY ^u. 

(12) 

m,L) 

' 13 1 

l 9 J 


I XU 1 

,11 \ *1 

l 12 J 

1.- .- > ! 
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In state 9, we convert the first a into an A. In state 10, we move the Tape Head to the 
right going through a’s and the # and perhaps other a’s until we find the A. To get to state 11, 
we change the first A to an a and start the trip back down the Tape leftward. In state 11, we 
skip over a’s and the # and more a’s until we find the last copied A. In state 12, we look to 
the right of this A. If there is a #, then there are no more c/’s to copy and we go to state 13. If 
there is another a , it must be copied so we change it to A and go back to state 10. 

In state 13, we must change the A’s back to c/’s so we can repeat the process. Then we 
look for the next a in the first factor: 


CA,a,L) (a,a,L) 



After changing the A’s back to c/’s, we move left, through the middle b, into whatever is 
left of the first factor a n \ If the cell to the immediate left of b is blank, then the multiplication 
is finished and we move to state 15. If the cell to the left of b has an a in it, we go to state 16. 
Here, we move leftward through the a’s until we find the first A, then right one cell to the 
next a to be erased. Changing this a to a A, we repeat the process of copying the second fac¬ 
tor into the A’s after the # and a’s by returning to state 8. 

When we get to state 15, we have the simple job left of erasing the now useless second 
factor: 


( a,b;A,R) ( a,a,R) ( a,a,L) 



Going to state 18, we change the # into an a so we must later erase the end a. Using states 
18 and 19, we find the end a and erase it. In state 20, we go back down the Tape to the left to see 
if there are more A’s in front of the answer. If so, we make one an a and go back to state 18. If 
not, we encounter the b in cell i, delete it, and halt. This completes the machine MPY. ■ 

EXAMPLE 

Let us write out the full trace of MPY on the input baabaa: 

START INSERT b 1 2 2 

aabaa —* baabaa —* baabaa —> baabaa —* baabaa 
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3 

baabaa 

5 

baabaa# 

6 

baabaa# 

9 

bAabaa# 

n 

bAabAa#a 

10 

bAabAA#a 

11 

bAabAA#aa 

13 

bAabAa#aa 

17 

hAabaa#aa 

10 

bAAbAa#aa 

11 

bAAbAa#aaa 

11 

bAAAa#aaa 

10 

bAAbAA#aaa 

11 

bAAbAA#aaaa 

12 

bAAbAA#aaaa 

14 

bAAbaa#aaaa 

15 

bAAAAA#aaaa 

18 

bAAAAAaaaaa 

20 

bAAAAAaaaa 

18 

bAAAAaaaaa 

18 

bAAAAaaaaaA 

20 

bAAAAaaaa 

18 

bAAAaaaaa 

19 

bAAAaaaaa 

20 

bAAAaaaa 

18 

bAAaaaaa 

20 

bAAaaaa 


4 

-*■ baabaa 

5 

-> baabaa# 

7 

baabaa # 

10 

-> bAabAa# 

11 

-> bAabAa#a 
10 

-»■ bAabAAtta 
11 

-> bAabAA#aa 
13 

-» bAabaa#aa 

8 

bAAbaa#aa 

10 

bAAbAa#ga 

11 

-> bAAbAa#aaa 
12 

bAAbAa#aaa 

10 

-> bAAbAA#aaa 
11 

“♦ bAAhAA#aaaa 
13 

-> bAAbAAMaaaa 
15 

bAAbaa#aaaa 

18 

bAAAAAaaaaa 

18 

-> bAAAAAaaaaaA 
20 

-> bAAAAAaaaa 
18 

bAAAAaaaaa 

19 

—> bAAAAaaaaa 

20 

-> bAAAAaaaa 

18 

-> bAAAaaaaa 
20 

-> bAAAaaaa 
20 

—> bAAAaaaa 
18 

bAAaaaaa 

20 

bAAaaaa 


4 

baabaaA 

6 

baabaa# 

8 

bAabaa# 

10 

bAabAa # 

11 

bAabAa#a 

10 

bAabAA#aA 

12 

bAabAA#aa 

14 

bAabaa#aa 

9 

bAAbaa#aa 

10 

bAAbAa#aa 

11 

bAAbAa#aaa 

10 

bAAbAA#aaa 

10 

bAAbAA#aaaA 

11 

bAAbAA#aaaa 

13 

bAAbAa#aaaa 

15 

bAAAaa#aaaa 

18 

bAAAAAaaaaa 

19 

bAAAAAaaaaa 

20 

bAAAAAaaaa 

18 

bAAAAaaaaa 

20 

bAAAAaaaa 

20 

bAAAAaaaa 

18 

bAAAaaaaa 

20 

bAAAaaaa 

18 

bAAaaaaa 

18 

bAAaaaaaA 

20 

bAAaaaa 


baabaa# 

6 

baabaa # 

8 

bAabaa# 

10 

bAabAa#A 

12 

bAabAgfia 

11 

bAabAA#ga 

13 

bAabAA#aa 

16 

bAabaa#aa 

10 

bAAbAa#aa 

10 

bAAbAa#aaA 

11 

bAAbAa#aaa 

10 

bAAbAA#aaa 

11 

bAAbAA#aaaa 

11 

bAAbAA#aaaa 

13 

bAAbaa#aaaa 

15 

bAAAAa#aaaa 

18 

bAAAAAaaaaa 

20 

bAAAAAaaaa 

20 

bAAAAAaaaa 

18 

bAAAAaaaaa 

20 

bAAAAaaaa 

18 

bAAAaaaaa 

18 

bAAAaaaaaA 

20 

bAAAaaaa 

18 

bAAaaaaa 

19 

bAAaaaaa 

20 

bAAaaaa 
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20 


18 


18 


18 


bAAaaaa 

-* 

bAaaaaa 

—> 

bAaaaaa 

-> 

bAaaaaa 


18 


18 


19 


20 


bAaaaaa 


bAaaaaaA 

—> 

bAaaaaa 

—> 

bAaaaa 


20 


20 


20 


20 


bAaaaa 

-> 

bAaaaa 


bAaaaa 


bAaaaa 


18 


18 


18 


18 


bagaaa 


baagaa 


baaaaa 


baaaaa 


18 


19 


20 


20 


baaaaaA 


baaaaa 

-> 

baaaa 


baaaa 


20 


20 


20 


DELETE 

HALT 

baqaa 

-*> 

baaaa 

-> 

baaaa 

— > 

aaaaa 

—* 








il 


This is how one TM calculates that 2 times 2 is 4. No claim was ever made that this is a 
good way to calculate that 2X2 = 4, only that the existence of MPY proves that multiplica¬ 
tion can be calculated, that is, is computable. 

We are dealing here with the realm of possibility (what is and what is not possible), not 
optimality (how best to do it); that is why this subject is called computer theory, not “a prac¬ 
tical guide to computation.” 

Remember that electricity flows at (nearly) the speed of light, so there is hope that an 
electrical TM could calculate 6X7 before next April. 

TMs are not only powerful language-recognizers, but they are also powerful calcula¬ 
tors. 


EXAMPLE 

A TM can be built to calculate square roots, or at least to find the integer part of the square 
root. The machine SQRT accepts an input of the form ba n and tests all integers one at a time 
from 1 on up until it finds one whose square is bigger than «. 

Very loosely, we draw this diagram (in the diagram, we have abbreviated SUCCESSOR 
“Sue,” which is commonly used in this field: 



Therefore, we can build SQRT out of the previous TMs we have made. ’ 
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CHAPTER 25 Computers 

<§ CHURCH’S THESIS 

What functions cannot be computed by a TM? The answer is surprising: “It is believed that 
there are no functions that can be defined by humans, whose calculation can be described by 
any well-defined mathematical algorithm that people can be taught to perform, that cannot 
be computed by TMs. The TM is believed to be the ultimate calculating mechanism.” 

This statement is called Church’s thesis because Alonzo Church (1936 again) gave 
many sophisticated reasons for believing it. Church’s original statement was a little different 
because his thesis was presented slightly before Turing invented his machines. Church actu¬ 
ally said that any machine that can do a certain list of operations will be able to perform all 
conceivable algorithms. He tied together what logicians had called recursive functions (after 
the work of Godel) and computable functions (after the goal of Hilbert). TMs can do all that 
Church asked, so they are one possible model of the universal algorithm machines Church 
described. 

Unfortunately, Church’s thesis cannot be a theorem in mathematics because ideas such 
as “can ever be defined by humans” and “algorithm that people can be taught to perform” 
are not part of any branch of known mathematics. There are no axioms that deal with “peo¬ 
ple.” If there were no axioms that dealt with triangles, we could not prove any theorems 
about triangles. There is no known definition for “algorithm” either, as used in the most gen¬ 
eral sense by practicing mathematicians, except that, if we believe Church’s thesis, we can 
define algorithms as what TMs can do. This is the way we have (up to today) resolved the 
old problem of, “Of what steps are all algorithms composed? What instructions are legal to 
put in an algorithm and what are not?” 

Not all mathematicians are satisfied with this. Mathematicians like to include in their 
proofs such nebulous phrases as “case two can be done similarly,” “by symmetry we also 
know,” or “the case of n — 1 is obvious.” Many mathematicians cannot figure out what other 
mathematicians have written, so it is often hopeless to try to teach a TM to do so. However, 
our best definition today of an algorithm is that it is a TM. 

Turing had the same idea in mind when he introduced his machines. He argued as fol¬ 
lows. 

If we look at what steps a human goes through in performing a calculation, what do we 
see? (Imagine a woman doing long division, e.g.) She writes some marks on a paper. Then 
by looking at the marks she has written, she can make new marks or, perhaps, change the old 
marks. If the human is performing an algorithm, the mles for putting down the new marks 
are finite. The new marks are entirely determined by what the old marks were and where 
they were on the page. The rules must be obeyed automatically (without outside knowledge 
or original thinking of any kind). A TM can be programmed to scan the old marks and write 
new ones following exactly the same rules. The Tape Head can scan back and forth over the 
whole page, row by row, and recognize the old marks and replace them with new ones. The 
TM can draw the same conclusions a human would as long as the human was forced to fol¬ 
low the rigid rules of an algorithm instead of using imagination. 

Someday, someone might find a task that humans agree is an algorithm but that cannot 
be executed by a TM, but this has not yet happened. Nor is it likely to. People seem very 
happy with the Turing-Post-Church idea of what components are legal parts of algorithms. 

There are faulty “algorithms” that do not work in every case that they are supposed to 
handle. Such an algorithm leads the human up to a certain point and then has no instruction 
on how to take the next step. This would foil a TM, but it would also foil many humans. 
Most mathematics textbooks adopt the policy of allowing questions in the Problems section 
that cannot be completely solved by the algorithms in the chapter. Some “original thinking” 
is required. No algorithm for providing proofs for all the theorems in the Problems section is 



ever given. In fact, no algorithm for providing proofs for all theorems in general is known. 
Better or worse than that, it can be proved that no such algorithm exists. 

We have made this type of claim at several places throughout this book; now we can 
make it specific. We can say (assuming as everyone does that Church’s thesis is correct) that 
anything that can be done by algorithm can be done by TM. Yet we have shown in the previ¬ 
ous chapter that there are some languages that are not recursively enumerable. This means 
that the problem of deciding whether a given word is in one such particular language cannot 
be solved by any algorithm . 

When we proved that the language PALINDROME is not accepted by any FA, that did 
not mean that there is no algorithm in the whole wide world to determine whether or not a 
given string is a palindrome. There are such algorithms. However, when we proved that 
ALAN is not r.e., we proved that there is no possible decision procedure (algorithm) to de¬ 
termine whether or not a given string is in the language ALAN. 

Let us recall from Chapter 1 the project proposed by David Hilbert. When he saw prob¬ 
lems arising in set theory, he asked that the following statements be proven: 

1. Mathematics is consistent. Roughly, this means that we cannot prove both a statement 
and its opposite, nor can we prove something horrible like 1 = 2. 

2. Mathematics is complete. Roughly, this means that every true mathematical assertion 
can be proven. Because we might not know what “true” means, we can state this as: 
Every mathematical assertion can either be proven or disproven. 

3. Mathematics is decidable. This, as we know, means that for every type of mathematical 
problem there is an algorithm that, in theory at least, can be mechanically followed to 
give a solution. We say “in theory” because following the algorithm might take more 
than a million years and still be finite. 

Many thought that this was a good program for mathematical research, and most be¬ 
lieved that all three points were true and could be proved so. One exception was the math¬ 
ematician G. H. Hardy, who hoped that point 3 could never be proven, because if there were 
a mechanical set of rules for the solution of all mathematical problems, mathematics would 
come to an end as a subject for human research. 

Hardy did not have to worry. In 1930 Kurt Godel shocked the world by proving that 
points 1 and 2 are not both true (much less provable). Most people today hope that this 
means that point 2 is false, because otherwise point 1 has to be. Then in 1936, Church, 
Kleene, Post, and Turing showed that point 3 is false. After Godel’s theorem, all that was left 
of point 3 was, “Is there an algorithm to decide whether a mathematical statement has a 
proof or a disproof, or whether it is one of the unsolvables.” In other words, can one invent 
an algorithm that can determine whether some other algorithm (possibly undiscovered) does 
exist that could solve the given problem? Here, we are not looking for the answer but merely 
good advice as to whether there even is an answer. Even this cannot be done. Turing’s proof 
of the undecidability of the halting problem meant, in light of Church’s thesis, that there is 
no possible algorithm to decide whether a proposed algorithm really works (terminates). 
Church showed that the first-order predicate calculus (an elementary part of mathematics) is 
undecidable. All hope for Hilbert’s program was gone. 

We have seen Post’s and Turing’s conception of what an algorithm is. Church’s model 
of computation, called the lambda calculus, is also elegant but less directly related to com¬ 
puter theory on an elementary level, so we have not included it here. The same is true of the 
work of Godel and Kleene on |m-recursive functions. Two other interesting models of com¬ 
putation can be used to define “computability by algorithm.” A. A. Markov (1951) defined a 
system today called Markov algorithms, or MA, which are similar to type 0 grammars, and 
J. C. Shepherdson and H. E. Sturgis (1963) proposed a register machine, or RM, which is 
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similar to a TM. Just as we might have suspected from Church’s thesis, these methods turned 
out to have exactly the same power as TMs. Of the mathematical logicians mentioned, only 
Turing and von Neumann carried their theoretical ideas over to the practical construction of 
electronic machinery and precipitated the invention of the computer. 


TMs AS LANGUAGE GENERATORS 

So far, we have seen TMs in two of their roles as transducer and as acceptor: 


x lt x 2 ,x 3 ... 

TOAMQnnrro «. 

Yi, Y 2 , Y 3 

inputs 


outputs 


x h x 2 , x 3 . . 

inputs 


ACCEPTOR 


As a transducer, it is a computer, and as an acceptor, it is a decision procedure. There is 
another purpose a TM can serve. It can be a generator: 


GENERATOR 


X V X 2 ,X 3 .. 


DEFINITION 

A TM is said to generate the language 

L — {w, w 2 w 3 . . .} 

if it starts with a blank Tape and after some calculation prints a # followed by some word 
from L. Then there is some more calculation and the machine prints another # followed by 
another word from L. Again, there is more calculation and another # and another word from 
L appears on the Tape. And so on. Each word from L must eventually appear on the Tape in¬ 
side of #’s. The order in which they occur does not matter and any word may be repeated in¬ 
definitely. ■ 

This definition of generating a language is also called enumerating it. With our next 
two theorems, we shall show that any language that can be generated by a TM can be ac¬ 
cepted by some TM and that any language that can be accepted by a TM can be generated by 
some TM. This finally explains why the languages accepted by TMs were called recursively 
enumerable. 


THEOREM 87 

If the infinite language L can be generated by the TM T^ then there is another TM, T a , th 
accepts L. 


PROOF 

The proof will be by constructive algorithm. We shall show how to convert T g into T a . 

To be a language-acceptor, T a must begin with an input string on its Tape and end up 
HALT when and only when the input string is in L. 

The first thing that T does is put a $ in front of the input string and a $ after it. In t 
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way, it can always recognize where the input string is no matter what else is put on the Tape. 
Now T a begins to act like T g in the sense that T a imitates the program of T g and begins to 
generate all the words in L on the Tape to the right of the second $. The only modification is 
that every time T g finishes printing a word of L and ends with a #, T a leaves its copy of the 
program of T for a moment to do something else. T a instead compares the most recently 
generated word of L against the input string inside the $’s. If they are the same, T a halts and 
accepts the input string as legitimately being in L. If they are not the same, the result is in¬ 
conclusive. The word may yet show up on the Tape. T a therefore returns to its simulation 

of T g . 

If the input is in L, it will eventually be accepted. If it is not, T a will never terminate exe¬ 
cution. It will wait forever for this word to appear on the Tape. 

accept (T a ) = L 
loop (T a ) = V 
reject (T a ) = (j> 

Although the description above of this machine is fairly sketchy, we have already seen 
TM programs that do the various tasks required: inserting $, comparing strings to see if they 
are equal, and jumping in and out of the simulation of another TM. This then completes the 
proof. ■ 

THEOREM 88 

If the language L can be accepted by the TM T a , then there is another TM, T , that generates 
it. 

PROOF 

The proof will be by constructive algorithm. What we would like to do is to start with a sub¬ 
routine that generates all strings of a’s and b’ s one by one in size and alphabetical order: 

A a baa abba bb aaa aab . . . 

We have seen how to do this by TM before in the form of the binary incrementor appropri- 
atley modified. After each new string is generated, we run a simulation of it on the machine T a . 
If T a halts, we print out the word on the Tape inside #’s. If T a does not halt, we skip it and go 
on to the next possibility from the string generator, because this string is not in the language. 

Unfortunately, if the T a simulation does not halt or crash, we are stuck waiting forever 
and we cannot go on to test the next possible input string. What we must do is not invest an 
indefinite amount of time investigating the acceptability of every word on T a . Now, of 
course, we cannot simply abandon a calculation that has been running a long time and say, 
“well, it’s probably hopeless” since we know by the very fact that the halting problem is un- 
decidable, that some input strings which look like they are going to run forever are, surpris¬ 
ingly, eventually accepted. So, we cannot wait for every string to be decided, nor can we 
abandon any string that is running too long. What can we do? 

The answer is that we run some number of steps of the simulation of the T a on a given 
input and then, assuming that no conclusive answer has been reached, we abruptly abandon 
this calculation and simulate the running of the next string on T a with the intention of return¬ 
ing to the simulation of the previous string at some later time and carrying it further. If we 
do this is an organized fashion, it will all work out. 
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TMs As Language Generators 




Let us number the possible input strings st(2), st( 3), in the usual lexicographic or¬ 
der. Let us, for the moment, assume that our simulation machine T g has four tracks. On the 
second track it generates, in order, all the integers (in a who-cares-which representation). Let 
us assume that at some point in the operation of T g , track 2 has the number N on it. 

Now on track 3 we generate, one by one, all possible input strings from j?( 1) up to 
st(N). Each time we generate another input string, we copy the string from track 3 to track 4 
and simulate the running of T a on it. But we only run the simulation for exactly N steps (this 
means N edges of the T a program), that is, unless T a crashes or halts before then. If /V steps 
have not been enough to draw a T^-membership conclusion on the input suggested by track 
3, tough luck. We waste no more effort on this input string at this iteration. We erase track 4 
and we go back down to track 3 and generate the next input string to be tested. If, however, 
the input string has been accepted within the N steps of the T a simulation we are prepared to 
expend, then we print the input string on track 1 between appropriate #’s. We still erase track 
4 and go back to track 3 for the next input string to be tested, but we have successfully found 
and printed a word in the language L. 

When we go back down to track 3 to get the next string, we have to be sure that we have 
not already tried all the strings up to st{N). In order to be sure of this, we must keep a 
counter on track 2 telling us how many strings we have indeed produced. If we have not 
gone up to N yet, then we do produce the next string and repeat the process. If, however, we 
find that we have already gone up to our limit st(N), then what we must do is erase this track 
and increment track 2. Track 2 now has the contents N + 1 on it. We begin again to generate 
strings on track 3. We start once more with sf(l) and test them to see if they are words ac¬ 
cepted by T a . We generate all the strings on track 3 from st( 1) to st(N + 1) and one by one 
simulate on track 4 the running of them on T a —for exactly N + 1 steps, this time. Again, if 
they are neither accepted nor rejected, they are abandoned temporarily. If they are accepted, 
they are printed on track 1, even if they have been printed on track l already. The simulation 
of T a on a particular input string begins at the very beginning START state of T a , even 
though we have once before already simulated the first N steps of the processing. Maybe N. 
steps were not enough, but N + 1 steps will do the trick. If no decision is made in N + 1 
steps, then we erase track 4 and get the next input test case from track 3, unless we have al¬ 
ready generated up to st(JV + 1), in which case we erase track 3 and increment track 2 to 
N + 2. • 

Clearly, the only strings that appear on track 1 are the words that have been discovered 
to already be in L by having been accepted by T a . It is also true that every word in L will 
eventually appear on track 1. This is because every word in L is accepted by T a in some finite 
number of steps, say, M steps. Eventually, track 2 will reach A/; this does not yet mean that 
the word will appear on this round of the iteration. Suppose that the word itself is string' 
st(K) and K is bigger than M. Then when track 2 has reached M , track 4 will test all the_ 
strings from tf(l) to st(M) for acceptance by T a but st(K) will not yet be tested. Once, how¬ 
ever, track 2 reaches K, track 3 will generate st(K) and track 4 will realize that it is accepted 
by T a within K steps and it will be printed on track 1. So, track 1 will eventually contain each 
of the words in L and only the words in L. j 

We can write this TM program in pseudocode as follows: 




' ; 
y$Mpi 

*: > 1 


1. Initialize track 2 to 0 and clear all other tracks. 

2 . Increment N on track 2 (i.e., N<^-N + 1), J*— 1, clear tracks 3 and 4. 

3. Do while generate st(J) on track 3, copy to track 4, simulate a maximum of iV 

steps of T a on track 4, print st(J) on track 1 if appropriate, clear track 4, J*— J + L 

4 . Goto 2. 


• • t 
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There are some issues that need to be addressed. The first is that once a word is accepted by 
N being large enough to generate it on track 3 and accept it on track 4, it will then also be gener¬ 
ated on eveiy subsequent iteration of step 3 in the algorithm. It will be generated as a test string, 
accepted by T a , and printed on track 1 over and over. This is true but it is not a damning complaint 
because the definition of a language-generator allowed for repeated appearances of words in L on 
the Tape. But this is excessive. Without running the risk of looping forever, we could add a step to 
our procedure that checks to see whether st(J) is actually a new word before printing it on track 1. 

Another quibble that needs to be thought through is that, although it is true that we have 
shown a multitrack TM can be simulated on a one-track TM, the simulation allowed the in¬ 
formation from the other tracks to appear on the one-track TM Tape. That happened because 
this issue arose when we were still considering TMs solely as language-acceptors, and all 
that was important was whether we got to HALT or not on a given input. All that is different 
now. If we are to simulate a four-track TM on a one-track TM, how are we going to avoid 
putting garbage on the Tape that gets confused with the mission of L-language-word-genera- 
tion? The answer is that we can simulate the different tracks on the TM separated by dividers 
other than the word demarkers used by T g to indicate words generated in L. We could let 
track 1 be the first field with its numerous #’s and L words. Then we could put a special sym¬ 
bol on the Tape to indicate the beginning of track 2—let us say a “T f ”. We could use another 
'T to separate the track 2 simulating field from the track 3 simulating field, and another to 
mark off track 4. These fields, even if bounded between 'R’s, are arbitrarily expandable and 
contractible using the subroutines INSERT and DELETE. The TM Tape is thus 


# word # word # . . 

^1 


track 2 number 


track 3 test string 

\p 

track 4 T a simulation 

. . . field 1 . 


1. 

. . field 2 . . . 

■ 1. 

. . field 3 . . . 

1 • 

. . field 4 . . . 


Slowly but surely, the Tape will include every particular word of L between #’s in field 1 and 
only the words of L between the #’s. As field 1 grows, it will never erase that which it has 
calculated. The other fields will change and recede into oblivion. ■ 

One thing we have to be careful about here is to realize that even if we have cleared up the 
repetition problem, the words that appear on the T g Tape are not necessarily going to be the 
words in L in their usual lexicographic order. This means that the word bbb may appear first and 
the word ab, also in the language L, may only appeal* many, many cells later. The reason for this 
is that the T a path to accept the word ab may be much longer (in steps) than the path to accept 
bbb , and so our T g simulating machine will discover that bbb is an acceptable word first. 

One might suggest, at this point in the discussion, that this problem may be easily 
cleared up by a simple expediency analogous to that which avoided duplications from ap¬ 
pearing in field 1; namely, right before we go to write a word on track 1, why not just sort 
the words already there and insert the new word into its proper position? This is a fine sug¬ 
gestion but it does not solve the problem. Remember that T, is an infinitely running machine. 
As we have defined it, it will even run forever to generate a finite language L. Step 4 in the 
algorithm always reverts back to step 2. This means that the occasion on which ab will be 
recognized as being a word in L and then be inserted on track 1 in front of bbb will be an un¬ 
predictable occurrence in the indefinite future. 

Now one might suggest that this is all true of the inferior machine we have designed for 
T in the proof above, but a much smarter model language-generator for L might exist that 
does turn out the words of L in size order. The answer to this is that that is quite true, but 
only for some languages L, and not others as the next theorem indicates. 
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THEOREM 89 

The words in a language L can be generated by a TM in size order if and only if L is recursive. 

PROOF 

First, we shall show that if the language L is recursive, then it can be generated by some T 
in size order. This is easy. We take the machine we designed earlier to generate all strings in 
size order, but instead of running each of them only a limited amount in order to avoid enter¬ 
ing an infinite loop, we start with a T a for L that never loops at all. Such exist for all L's that 
are recursive. Now we can test the strings in size order, simulate them finitely on T a , and 
print them out on track 1 if and only if they reach HALT. 

We shall now prove that if L is a language that can be generated by some T in size or¬ 
der, then L must be recursive. Out of the assumed order-generating T g , we shall make a T a 
that accepts L and rejects all of L'. This is also easy. Into T a we input the string to be tested, 
and call it w. We then simulate the running of T' g until its output of words of L has progressed 
to the extent the words being generated are larger than w. This will only take a finite amount 
of time. When we know the whole language L out as far as w , we simply check to see 
whether w is among the words generated thus far by T . If it is, we accept it; if not, we reject 
it. This is a complete decision procedure. ■: 

Because not all languages are recursive, we know that, oddly enough, there are TMs that 
can generate certain languages L but never in size order. Actually, and subtly, this is not quite 
true. What we do know is that we cannot depend on these language-generating TMs to pro¬ 
duce L in size order, but they just might do it anyway. It might just be the case that the asso¬ 
ciated T a happens always to accept shorter words by shorter paths. We would, however, 
never know that this was going to happen reliably. We could never be sure that no word out 
of order is ever going to appear on the Tape. If we could be sure, then by the proof above, L 
• would have to be recursive. This emphasizes the distinction between what is knowable and 
decidable and what may just happen adventitiously. 

Another example of this distinction is the suggestion that instead of working so hard in 
the construction of T g to avoid looping forever on inputs in loop(7 a ), we could simply let this 
decision be made by nondeterminism. The nondeterministic TM to generate L simply (fortu¬ 
itously) skips over all the troublesome words in loop(T rt ) and simulates the acceptance of the 
good ones. If there is a nondeterministic TM to generate L, then we can turn it into a deter¬ 
ministic one, no? In light of the previous theorem, we know there must be something (or 
some things) wrong with this proposal. What they are, we leave for the Problems section. 

As we can see, we have just begun to appreciate TMs; many interesting and important 
facts have not been covered (or even discovered). This is also true of PDAs and FAs. 

For a branch of knowledge so new, this subject has already reached some profound 
depth. Results in computer theory cannot avoid being of practical importance, but at the 
same time we have seen how clever and elegant they may be. This is a subject with twenty- 
first century impact that yet retains its old world charm. 

^ PROBLEMS 

1. Trace these inputs on ADDER and explain what happens; 

(i) aaba 

(ii) aab 

(iii) baaa 

(iv) b 


Problems 


617 


2. (i) Build a TM that takes an input of three numbers in unary encoding separated by b’s 

and leaves their sum on the Tape, 

(ii) Build a TM that takes in any number of numbers in unary encoding separated by b’s 
and leaves their sum on the Tape. 

3. Describe how to build a binary adder that takes three numbers in at once in the form 

$(0 + 1)*$(0 + 1)*$(0 + 1 )* 

and leaves their binary total on the Tape. 

4. Outline a TM that acts as a binary-to-unary converter, that is, it starts with a number in 
binary on the Tape 

$(0 + 1)*$ 

and leaves the equivalent number encoded in unary notation. 

5. Trace these inputs on MINUS and explain what happens; 

(i) aaabaa 

(ii) abaaa 

(iii) baa 

(iv) aaab 

6. Modify the TM MINUS so that it rejects all inputs not in the form 

ba*ba* 

and converts ba n ba m into ba n ~ m . 

7. MINUS does proper subtraction on unary encoded numbers. Build a TM that does 
proper subtraction in binary encoded inputs. 

8. Run the following input strings on the machine MAX built in the proof of Theorem 83 

(p. 601); 

(i) aaaba 

(ii) baaa (Interpret this.) 

(iii) aabaa 

(iv) In the TM MAX above, where does the Tape Head end up if the second number is 
larger than the first? 

(v) Where does it end if they are equal? 

(vi) Where does it finish if the first is larger? 

9. MAX is a unary machine; that is, it presumes its input numbers are fed into it in unary 
encoding. Build a machine (TM) that does the job of MAX on binary encoded input. 

10. Build a TM that takes in three numbers in unary encoding and leaves only the largest of 
them on the Tape. 

11. Trace the following strings on IDENTITY and SUCCESSOR: 

(i) aa 

(ii) aaaba 

12. Build machines that perform the same function as IDENTITY and SUCCESSOR but on 
binary encoded input. 

13. Trace the input string 


bbaaababaaba 







CHAPTER 25 Computers 


*>-- - 


on SELECT/3/5, stopping where the program given in the proof of Theorem 85 ends, 
that is, without the use of DELETE A. 

14. In the text, we showed that there was a different TM for SELECT////? for each different 
set of / and n. However, it is possible to design a TM that takes in a string form 


*. 


m : 


and interprets the initial clump of a 's as the unary encoding of the number i. It then con 
siders the word remaining as the encoding of the string of numbers from which we mus 
select the ith. 

(i) Design such a TM. 

(ii) Run this machine on the input 

aabaaabaabaaba 

15. On the TM MPY, from the proof of Theorem 86 (p. 605), trace the following inputs: 

(i) habaa 

(ii) baaaba 

16. Modify MPY so that it allows us to multiply by 0. 

17. Sketch roughly a TM that performs multiplication on binary inputs. 

18. Prove that division is computable by building a TM that accepts the input string ba m ba 
and leaves the string ba q ba r on the Tape, where q is the quotient of m divided by n and 
is the remainder. 

19. Show that a TM can decide whether or not the number n is prime. This means that a T 
exists called PRIME that, when given the input a n , will run and halt, leaving a 1 in ce 
if n is a prime and a 0 in cell i if n is not prime. 

20. What is wrong with the nondeterministic approach to building an ordered language g 
erator as described on p. 616. 
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